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1.  INTRODUCTION 


I 


This  report  briefly  reviews  the  work  performed  under  Amy  Research  Office 
contract  No.  0AAG29-83-C-0027  on  the  development  of  parametric  techniques  for 
multichannel  signal  processing.  The  results  are  summarized  in  a  number  of 
papers,  which  are  enclosed  as  appendices  A-M. 

1.1  MULTICHANNEL  SIGNAL  PROCESSING 

Most  of  the  work  in  the  area  of  signal  processing  (in  particular  adaptive 
signal  processing)  is  concerned  with  the  single  channel  case:  the  design  and 
analysis  of  filters  with  a  single  input  and  a  single  output  (SI SO).  This  type 
of  processing  is  naturally  suited  to  situations  involving  a  scalar  time  series 
such  as  the  video  signal  in  a  radar  system  or  the  output  of  a  communication 
receiver.  Many  problems  of  great  practical  interest  involve  vector  time 
series  such  as  the  signals  in  an  acoustic  or  seismic  array.  To  perform 
optimal  prediction/estimation  of  such  signals  will  usually  require  multi-input 
multi -output  (MIMO)  filters.  Because  of  the  higher  complexity  (both 
conceptual  and  computational)  of  MIMO  filters,  they  are  often  replaced  by 
suboptimal  single  channel  processors. 

In  some  recent  work  we  developed  a  multichannel  processor  for  the  problem 
of  estimating  the  parameters  (location  and  spectrum)  of  multiple  targets  from 
multi-sensor  data  [10].  Preliminary  simulation  results  indicated  that 
significant  performance  improvements  are  achievable  by  performing  optimal 
multichannel  processing  instead  of  the  more  conventional  single  channel 
processing.  These  initial  positive  results  motivated  us  to  study  further  the 
design  and  analysis  of  MIMO  filters  and  their  applications. 

The  advent  of  powerful  VLSI  processors  makes  it  feasible  to  consider  the 
more  complex  MIMO  signal  processing  archietectures.  The  theoretical  framework 
necessary  for  the  development  of  multichannel  processing  techniques  is 
currently  available;  researchers  in  system  theory  and  modern  control  have  been 
treating  MIMO  problems  for  the  past  two  decades.  We  feel,  therefore,  that  the 
time  is  right  for  the  development  and  application  of  optimal  multichannel 
signal  processing  techniques. 


1.2  THE  PARAMETRIC  APPROACH 


Autoregressive  moving- average  (ARMA)  models  are  widely  used  in  the 
statistical  analysis  of  time  series.  In  signal  processing,  autoregressive 
(AR)  techniques  have  been  used  for  high  resolution  spectral  estimation,  linear 
predictive  coding,  and  (implicitly)  in  various  adaptive  filtering  applications 
Cl]-C3].  The  use  of  ARMA  models  and  the  related  infinite  impulse  response 
(HR)  prediction  filters  has  been  much  more  limited  due  to  the  difficulty  of 
reliably  estimating  the  parameters  of  such  models  from  noisy  data.  Practical 
applications  of  these  techniques  have  been  limited  to  the  single  channel  case. 

In  some  recent  work  we  applied  (scalar)  ARMA  modeling  techniques  borrowed 
from  the  area  of  system  identification  to  signal  processing  problems  such  as 
adaptive  line  enhancement,  adaptive  noise  cancelling,  adaptive  deconvolution, 
and  spectral  estimation  [4]-C7].  We  also  developed  a  very  robust  non-adaptive 
ARMA  parameter  estimation  technique  which  was  used  for  high  resolution 
spectral  estimation  [11].  Other  ARMA  spectral  estimation  techniques  were 
reported  in  [8]-[9].  Based  on  the  accumulated  experience  with  AR  and  ARMA 
signal  processing  techniques  it  seems  that  the  single  channel  case  is 
reasonably  well  developed  by  now.  (It  should  be  noted,  however,  that  many 
questions  are  still  open  in  the  area  of  ARMA  modeling  for  adaptive  IIR 
filtering. ) 

The  natural  next  step  is  to  extend  techniques  for  ARMA  modeling  to  the 
MIMO  case  and  to  use  them  for  designing  multichannel  signal  processors.  The 
main  thrust  of  our  research  was,  therefore,  the  development  of  robust 
estimation  techniques  for  MIMO  ARMA  parameters.  Once  these  parameters  are 
estimated,  they  can  be  used  to  design  MIMO  filters  for  a  variety  of 
applications,  as  was  shown  in  C4]-[7]  for  the  single  channel  case.  The 
problem  of  estimating  MIMO  ARMA  parameters  involves  difficulties  which  were 
not  present  in  the  SI  SO  case.  These  difficulties  are  related  to  the  complex 
structure  of  MIMO  systems  and  to  questions  of  non-uniqueness  of  the 
representation  of  vector  time-series. 


2.  PROJECT  PUBLICATIONS 


The  following  is  a  list  of  publications  summarizing  the  work  performed  on 
this  project.  The  key  publications  are  included  as  appendices  to  this  report. 

In  this  project  we  developed  a  number  of  accurate  ARMA  estimation 
techniques  which  can  be  used  for  single  and  multichannel  problems.  These 
techniques  require  a  modest  amount  of  computation  compared  to  a  full-blown 
maximum  likelihood  technique.  We  have  also  developed  asymptotic  performance 
bounds  that  make  it  possible  to  evaluate  the  accuracy  of  these  techniques. 

The  results  of  this  work  are  summarized  in  more  than  30  project  publications 
(see  Section  2)  and  the  key  results  are  included  in  this  report  in  appendices 
A-M.  These  results  have  a  wide  range  of  applications  in  the  area  of 
surveillance,  communications,  and  statistical  signal  processing. 

2.1  PUBLISHED  JOURNAL  PAPERS 

1.  B.  Friedlander  and  B.  Porat,  “Some  Bounds  for  the  Estimation  of 
Autoregressive  Signals  in  White  NMoise,"  Signal  Processing,  No.  8,  pp. 
291-302,  1985. 

2.  P.  Stoica,  j  soderstrom  "Optimal  Instrumental 

Variable  Estimates  of  the  AR  Parameters  of  an  ARMA  Process,"  IEEE  Trans. 
Automatic  Control,  Vol.  AC-30,  No.  11,  pp.  1065-1075,  November  1985. 

2.2  ACCEPTED  FOR  PUBLICATION  IN  JOURNALS 

3.  P.  Stoica,  B.  Friedlander  and  t.  Soderstrom  .  "Least-Squares,  Yule-Walker 
and  Overdetermined  Yule-Walker  Estimation  of  AR  Parameters:  A  Monte  Carlo 
Study  of  Finite  Sample  Properties,"  Int.  J.  Control,  to  appear. 

4.  B.  Porat  and  B.  Friedlander,  "Computation  of  the  Exact  Information  Matrix 
for  Gaussian  Time  Series  with  Stationary  Random  Components,"  IEEE  Trans. 
Acoustics,  Speech  and  Signal  Processing,  to  appear. 

5.  B.  Friedlander  and  B.  Porat,  "Multichannel  Spectral  Analysis  Using  the 


Modified  Yule-Walker  Equations,"  J.  Signal  Processing,  Special  Issue  on 
Spectral  Estimation,  to  appear. 

2.3  UNDER  REVIEW 

6.  P.  Stoica,  B.  Fried! ander  and  T.  Soderstrom  ,  "Optimal  Instrumental 
Variable  Multi  step  Algorithms  for  the  Estimation  of  AR  Parameters  of  an 
ARMA  Process." 

7.  P.  Stoica,  B.  Friedlander  and  T.  Soderstrom  ,  "An  Approximate  Maximum 
Likelihood  Estimator  of  ARMA  Parameters." 

8.  P.  Stoica,  B.  Friedlander  and  T.  Soderstrom  ,  "Maximum  Likelihood 
Estimation  of  the  Parameters  of  Multiple  Sinusoids  in  Noise." 

9.  B.  Porat  and  B.  Friedlander,  "Adaptive  Detection  of  Deterministic 
Transient  Signals." 

10.  8.  Porat  and  8.  Friedlander,  "Asymptotic  Performance  Analysis  of  ARMA 
Parameter  Estimation  Methods  Based  on  Sample  Covariances,"  IEEE  Trans. 
Automatic  Control. 

11.  B.  Porat  and  B.  Friedlander,  "The  Exact  Cramer-Rao  Bound  for  Gaussian 
Autoregressive  Processes,"  IEEE  Trans.  Information  Theory. 

12.  P.  Stoica,  B.  Friedlander  and  T.  Soderstrom  ,  "On  Instrumental  Variable 
Estimation  of  Sinusoid  Frequencies  and  the  Parsimony  Principle,"  IEEE 
Trans.  Acoustics,  Speech  and  Signal  Processing. 

13.  B.  Friedlander,  P.  Stoica  and  T.  Soderstrom  ,  "Instrumental  Variable 
Methods  for  ARMA  Models,"  Chapter  in  Vol.  XXIV  of  "Advances  in  Control  and 
Dynamic  Systems." 

14.  8.  Porat  and  B.  Friedlander,  "Parameter  Estimation  of  Continuous-Time 
Stationary  Gaussian  Processes  with  Rational  Spectra,"  IEEE  Trans. 
Acoustics,  Speech  and  Signal  Processing. 
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Abstract.  The  Cramer- Rao  lower  bound  iCRLBt  provides  a  useful  reference  for  evaluating  the  performance  of  parameter 
estimation  techniques.  This  paper  considers  the  problem  of  estimating  the  parameters  of  an  autoregressive  signal  corrupted 
by  white  none  An  explicit  formula  is  derived  for  computing  the  asymptotic  CRLB  for  the  signal  and  noise  parameters. 
Formulas  for  the  asymptotic  CRLB  for  funaions  of  the  signal  and  noise  parameters  are  also  presented.  In  particular,  the 
center  frequency,  bandwidth  and  power  of  a  second  order  process  are  considered.  Some  numerical  examples  are  presented 
to  illustrate  the  usefulness  of  these  bounds  in  studying  estimation  accuracy. 

Zusammeafauung.  Die  Cramer-Rao  untere  Crenze  'CRLBl  gibt  eine  niitzliche  Referenz  ftir  die  Performanz  Evaluation  von 
Parameter  Estimationstechniken.  Diese  Kommunikation  betrachtet  das  Problem  der  Estimation  von  Parametem  eines 
autoregressiven  Signals  in  weissem  Rauchen.  Eine  expliztte  Formel  wird  angegeben  fiir  den  asymptotischen  CRLB  von  Signal 
und  Rausch  Parametem,  Formeln  fur  den  CRLB  von  Funktionen  der  Signal  und  Rausch  Parameter  werden  auch  angegeben. 
In  spezieilen  werden  die  .Mittenfrequenz.  Bandbreite  und  Leistung  eines  Prozesses  zweiter  Ordnung  angegeben.  .Sumerische 
Beispieie  werden  gegeben  urn  die  Niitzlichkeit  dieser  Gretuen  zu  zeigen  wenn  Estimationsgenauigkeit  studien  wird. 

Resume.  La  borne  de  Cramer  Rao  est  un  moyen  utile  pour  evaluer  Teificacite  d’une  methode  d’estimation  On  etudie  dans 
cet  article  I'estimation  de  modules  de  type  AR  plus  bruit.  On  donne  des  formules  explicites  permettant  Je  calculer  de  maniere 
numenquement  etficace  ces  homes  de  Cramer  Rao.  On  examine  de  plus  pres  le  cas  de  la  frequence  centrale  et  la  bande 
passante  d  une  frequence  noye  dans  du  bruit.  Ces  resultats  sont  illustres  par  des  simulations 

Keywords.  Autoregressive.  Cramer  Rao  bound,  asymptotic  error  analysis. 


1.  Introduction 

The  problem  of  estimating  the  parameters  of 
signals  from  their  noisy  measurements  arises  in 
many  engineering  applications.  A  common  model 
for  a  wide-sen.se  stationary  random  signal  is  the 
autoregressive  ( AR)  model.  The  signal  is  assumed 
to  be  corrupted  by  white  measurement  noise.  In 

*  This  work  was  supported  by  ihe  Army  Research  Office 
under  Contract  .No.  DAAG29-83-C-0027, 


other  words, 

y,  =  X,  -r  w,,  (  1 ) 

where  x,  is  the  signal,  w,  is  a  zero-mean  white  noise 
process  with  variance  iri,  and  y,  is  the  observed 
data.  The  autoregressive  signal  obeys  the  stochastic 
difference  equation, 

n 

■T,  =  -  1  ~u„  1 2) 

I  -  I 

where  u,  is  a  zero-mean  white  noise  process  with 
variance  cr". 
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A  considerable  number  of  papers  in  the 
engineering  and  statistical  literature  treat  the  pro¬ 
cessing  and  estimation  of  autoregressive  signals, 
see  e.g.,  [1-6].  A  useful  tool  for  evaluating  the 
performance  of  such  AR  estimation  techniques,  is 
the  Cramer- Rao  lower  bound  (CRLB)  on  the 
covariance  matrix  of  the  estimated  parameters  [7], 
[8].  Comparison  of  the  covariance  matrix  of  a  given 
parameter  estimation  technique  to  the  CRLB  pro¬ 
vides  a  measure  of  the  accuracy  of  that  technique. 

While  the  CRLB  has  been  known  for  quite  some 
time,  relatively  little  work  seems  to  have  been  done 
on  its  computational  aspects.  In  [4],  [9]  a  simple 
numerical  integration  procedure  for  computing  the 
CRLB  is  proposed,  based  on  an  asymptotic  CRLB 
formula  due  to  Whittle  [10],  In  the  case  of  narrow- 
band  AR  Processes  considerable  care  must  be 
taken  to  avoid  excessive  numerical  errors.  In  this 
paper  (Section  2)  we  present  an  explicit  formula 
for  the  asymptotic  C  RLB  for  AR  plus-noise  proces¬ 
ses,  which  does  not  involve  numerical  integration. 

In  many  applications  one  ts  interested  not  in  the 
AR  parameters,  but  in  some  funaion  of  these 
parameters  such  as  the  center  frequency,  band¬ 
width  and  power  of  narrowband  spectral  lines.  In 
Section  3  we  present  formulas  for  computing  the 
CRLB  of  a  general  function  of  the  AR-plus-noise 
parameters  and  of  some  special  commonly  used 
funaions. 

In  Section  4  we  present  a  few  examples  illustrat¬ 
ing  how  to  use  the  CRLB  to  study  the  effect  of 
various  signal  and  noise  parameters  on  estimation 
accuracv. 


2.  An  explicit  formuia  for  the  Fisher  information 


In  this  section  we  derive  an  explicit  expression 
for  the  Fisher  information  matrix  for  the  param¬ 
eters  {a,, . . . ,  a„,  (t\,  (ri,}.  The  inverse  of  the  Fisher 
information  matrix  provides  the  Cramer-Rao 
lower  bound  on  the  estimation  errors  associated 
with  these  parameters.  The  derivation  is  somewhat 
lengthy,  and  will  be  performed  in  three  steps.  We 
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start  by  introducing  the  spectral  density  function 
Sir)  of  the  .AR-plus-noise  process  and  computing 
its  derivatives  with  respect  to  the  various  para¬ 
meters.  Using  Whittle's  formula  for  the  asymptotic 
form  of  the  Fisher  information  matrix  [10]  we 
express  the  entries  of  this  matrix  by  complex 
integrals  involving  the  spearum  Sir)  and  its 
derivatives.  Finally  we  evaluate  these  complex 
integrals  using  cenain  facts  from  the  theory  of 
discrete  Lyapunov  equations. 

2.1.  The  spectrum  and  its  derivatives 

The  spectrum  of  an  AR-plus-.Noise  process 
defined  in  ( 1 )  i2)  is  given  by 


Slri=- 


alrifll r"  ) 


,  _  cr~ -)- irjal  rial  r  ') 
airlalr"') 


a(  r!  =  1 -r  air-^  •  •  • -r- anr",  14) 

has  all  of  its  roots  outside  the  unit  circle.  Let  c(  r) 
and  K  be  defined  by 

cr; -^cr' a(  rial  r"')  = /Cc(  r )c(  r’'),  l5) 

where  clr)  is  the  unique  monic  stable  spearal 
factor  of  the  left-hand  side  of  (5).  i.e.. 

cl  r)  =  1 -r  C|  r  •+••••  t- c„r’',  1 6) 

and  Cl  r)  has  all  of  its  roots  outside  the  unit  circle. 
Sir)  and  its  inverse  are  given  bv 


Slr)  = 


Kclrlclr'') 

a(r)alr'') 


S-'lr)  = 


alrlalr  ') 
Kcl  r)c(  r"' ) 


To  compute  the  Fisher  information  matrix  we  need 
expressions  for  the  partial  derivatives  of  Sir)  with 
respect  to  the  parameters  {a.,...,a„  cri,  tri}. 
Straightforward  calculations  show  that 

(iSIr)  trl:^ _ 

r^a^  a'lr)a(r'')  airla'Ir 


,.v  _•» 
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/ot  V 


a(i)a(z"')’ 


As  we  will  see  next,  the  following  quantities  are 
also  required. 


•S-'(r)  = 


/Ca(j)c(r)c(i'‘) 


/Ca(r-')c(.')dr-'r 


f£i^S-(r)  = _ ^ _ 

hai  Kc(:)c(z 

^S[£)  ,  a(£)£(i2^ 

a<7i  fCc(:)c(:-'y 


2.2.  Whittle’s  formula 

Let  Sir)  be  the  spectral  density  funaion  of  a 
discrete  time  stationary  zero-mean  scalar  process, 
and  assume  that  this  spectral  function  depends  on 

some  parameter  vector  9  =  [9, . It  was 

shown  by  Whinle  [10]  that  the  asymptotic  form  of 
the  Fisher  information  matrix  /v  associated  with 
these  parameters  is  given  by 

fv  =  [/mL  l^m,  ( lOa) 

Mij  j  UBi,  DB,  z 

(lOb) 

where  f  represents  counter-clockwise  integration 
on  the  unit  circle  ( r  =  e'" ),  and  S  is  the  number 
of  data  points  used  to  estimate  the  parameters  9|. 
In  the  AR-plus-noise  case  the  entries  of  the  Fisher 
information  matrix  are  given  by, 

2  luj  J  r)a,  z 

L2iTyT  K-c-{z)c-{z  ')a-(z  ')  z 

1  r  .rt. dz] 
l^kl^tt.  (II) 


2  Ittj  J  HCk  z 

_  1  X _ (rzz~‘‘ _ ^ 

2ir7J  /C’c^r)c^(z"')a(r*')  r’ 


I  <  Jlc  <  n. 


2  Ittj  J  ofli,  htrl,  z 

,-,V-L((  d£ 

2iryT  A:V(r)c-(r-')  2’ 


1  «  Jt «  n. 


. 

1  r  1  d2 
“  2  2ir;  J  K-c-{z)c-{:-') 


2  2ttJ  7  3<r-  htri  z 

.V  1  r  .=(■•)»■(,-)  J.- 

2  2rr;  J  /CV(r)c^(r-)  r' 

These  expressions  can  be  evaluated  by  numerical 
integration.  However,  if  either  alz)  or  c(2)  have 
roots  very  close  to  the  unit  circle  considerable  care 
needs  to  be  exercised  to  avoid  numerical  problems. 
A  more  attractive  way  of  computing  is 
described  next. 

2.3.  Evaluation  of  the  complex  integrals 

To  evaluate  the  integrals  in  1 1I)-(16)  we  must 
first  introduce  some  notation.  Let  the  polynomial 
y(r)  be  defined  by 

y(r) »  c’lr)  =  1 -r  y,2-i - +  y3„z*".  (17) 
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The  stability  of  ylD  follows  from  the  stability  of 
c(r).  Next  denote 


— — — -rr*  I  V,:':  = 

■y(z)-y(r  )  i.-x 


-Ti  -y-  ■  ■  ■  -yim 

I  0  0 


1  0 


1  0 


•  LU 
Then  it  can  be  shown  [13]  that 

=  /s-2rt+l.  (19) 

Similary,  we  denote 

I  '•(?';  n  =  r.,  (20) 

a(i)a(c  )  (--X 


-a,  -a-i 


where  ei*[l,0 . 0]’^  =  an  n-dimensional  unit 

vector.  Finally,  let 

a(r)  *  a"(r )  =  1 -t- oii-t- ■  •  ■  (25) 


-a,  •  •  •  -a;„ 

1  ,  0  0 


Ci\  •  )  »» •  i> 


g,  =  #T(By)'-e,;  mao.  (27) 

where  ei=[0, 1 . 0]^»a  2n-dimensional  unit 

vector. 

Using  the  quantities  {y,,  r,,  gi}  defined  above, 
we  can  now  evaluate  the  various  complex  integrals 
introduced  earlier.  The  first  integral  in  ( 1 1 )  is  just 
the  coefficient  of  in  the  power  series  expansion 
of 

_ 1 _ 

y(r)7(c'')a(j'') 

Note  that. 


£  S 

=  1  I  v,g„:  . 


1  f  d; 

2'ny  J  r(r)r(r'')a(?'')  r 


r,  =  e:(/4j'*"-'p 

/a-n  +  1. 

Ne.xt  define, 

1  ® 

,  - 1 ,  *  ^  ^1*1— 
a(r  )  m-o 

Then 

»  eJiAc^e, 

=  eJ(Aj)'"e,,  maO, 

Siffial  Procc««tnf 


■no 


x[  I  (Cj)"e2„nBT)'"  k*. 

L'n-O  J 
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where  the  matrices  5  and  G  are  defined  in  eqs. 
(Al)  and  (A3)  of  the  Appendix. 

The  second  integral  in  ( 1  i )  is  the  coefficient  of 
in  the  power  series  expansion  of 


Note  that. 


■y(r)y(r'')a(c)a(r'') 


=  I  I  r,,-- 


a  V  V 

—  «.  y/m- 


Im  —  x  m»  -  X 


Hence 


l-ifjj  y{z)y{:'')a{:}a{:'')  : 


““  M  — 


-  I  3; 


m*0  m«0 


^  .-  */  4  vm^rt—  1 


+  ^  ^-T|' 


X  I  (Cj)'"e:,eI(Ac)"'  (A,) 

,m*0  , 


T/  /-•T\  In— I  "^Z-k 


+nc 


:  f  (Cr)'"e:„e:(Aj'" 

,m»0  . 


x(  Ar)"‘'p-r„u,.t 


X  UH[r,. . /-of 

Hf>Zn-l^l-k . U|-k]C 

X  UH[ . roJ^  -  r„ t',_ t,  (31) 


where  U  and  H  are  defined  in  eqs.  (A8)  and  (A9) 
of  the  Appendix. 

The  integral  in  (12)  can  be  evaluated  similarly 
to  the  integral  in  (29),  using  a(z~')  instead  of 
a(r'‘).  The  result  of  this  evaluation  is 


r'‘  dz 


Istj  ]  •ytr)-y(c'')a(r'  )  z 


=  4’'(  Cj)'‘"-"-'C(/e, 

=  [»k->- . i\]GUe,. 


It  is  straightforward  to  check  that  the  integral  in 
(13)  is  given  by 


1  X  a(:)J‘*  dz  ^ 
Zs]  j  yiz)yiz  )  r 


ao=  1- 


The  integral  in  ( 14)  is  given  by, 


1  X  I 

2-!t;  J  y(z)-y(r‘')  z 


and  the  integral  in  ( IS)  is  given  by 


I  X  a(c)a(i"')  dr  ^  ^ 

,  ;  -u~=  I  I 

2wy  y(r)y(r  )  r  i-o  ^-o 


Finally,  the  integral  in  (10)  is  given  by 


1  X  i-" 

—  — -tt-T®  -  ^  aiamV,-„ 

ZiTj  J  y(r)y(r  )  r  i-om-o 


Eqs.  (29),  (,31)-(38)  provide  explicit  expressions 
for  computing  the  entries  of  the  Fisher  information 
matrix.  The  computation  ,.f  the  quantities  appear¬ 
ing  in  these  equations  (the  scalars  v,,  r,  and  the 
matrices  C,  5,  L\  H)  is  discussed  in  the  Appendix. 
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3.  Bounds  on  the  estimates  of  spectral  parameters  3.Z  The  signal-plus-noise  spectrum 


In  Che  previous  section  vve  developed  the 
asymptotic  bounds  on  variance  of  unbiased  esti¬ 
mates  of  the  AR  parameters  d**[a . .  <r*. 

In  many  applications  one  is  interested  not 
in  the  AR  parameters,  but  in  some  function  of 
these  parameters,  such  as  the  spearum  of  the  sig¬ 
nal-plus-noise  (cf.  (3)1  or  the  signal  only. 


SA:)  = 


a(r)a(r*')" 


(37) 


In  the  case  of  signals  with  narrowband  spectra  we 
may  be  interested  in  spectral  parameters  such  as 
bandwidth,  center-frequency  and  power  of  each 
narrowband  component.  In  this  section  we  derive 
the  formulas  for  computing  the  bound  on  the  esti¬ 
mation  error  of  various  functions  of  the  AR-plus- 
noise  parameters. 


3.1.  A  general  formula 

Given  a  scalar  function  f(9)  of  a  parameter 
vector  =  [9,, . . . ,  the  variance  of  any 
unbiased  estimator  of  /(^)  from  .V  data  points  is 
bounded  from  below  by  the  following  generalized 
Cramer-Rao  bound  [11]: 

Var{/(fl)}sD^/;'D.  (38a) 

where 

pT  ¥(g)1 

96,  ”  ■  •  ’  J 

“vector  of  partial  derivatives,  (38b) 

fv  =  the  Fisher  information 

matrix  associated  with  estimating 

^icf.  (10)).  (38c) 

The  computation  of  fv  'vas  discussed  in  detail  in 

Section  2.  It  remains  to  evaluate  the  derivative 
vector  D  for  the  functions  of  interest. 

Si«n4»  ProcMsm* 


To  evaluate  the  bounds  on  the  signal-plus-noise 
spectrum  Sie*")  we  must  evaluate  the  entries  of 


or  A 


rt5(e^") 

ha, 


■)5(e^")  ^5(e^")  rt5(e^“)' 
ha„  ’  ho-i  ’  htri  .  ■ 


(39) 


Straightforward  dillerentiation  of  equation  (3) 
gives  the  following: 


^5(e^“) 

ha,, 


a(e'“)a(e'’“) 


,  (40a) 


rtSte^*") _ I 

ha-i  "a)e^“)a(e''“')’ 


(40b) 


^cri 


(40c) 


To  evaluate  the  bound  on  the  signal  spectrum 
S,)e^")  we  must  use 


DT^ 


ha,  ' 


ha„  '  har'„  '  . 


3.3.  Spectral  parameters  of  a  second  order  AR-plus- 
noise  process 

Consider  a  second  order  AR  process  with  a 
polynomial  having  a  comple.x  pair  of  roots  at 

a(r)  =  1  h-a,z-*-azZ' 

=  1  —  2p  cos  <uoC  +  p"r‘.  (42) 


The  central  frequency  f,  of  the  spectrum  of  this 
process  is  defined  by  the  angle  (or  phase)  of  the 
roots: 

/,  =  a;„/2ff.  (43) 


To  compute  the  bound  on  the  estimation  error 
of  fn  we  must  evaluate 


h}(, 

.rtOi’  ria-’  h(T~f  h<T'....\ 


(44) 
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The  partial  derivatives  are  given  by 


so-called  noise  bandwidth  defined  bv 


i  ■ 

"/)  1 

(45) 

I  S,(e'“)da; 

ha,  2wv4a-  — a) 

^ - 

^  .  tiM  X 

(54) 

1 

~<Ji 

—  - -  (46) 

rtfl;  4iTa;v4a;-a7 

-^  =  ^  =  0.  (47) 

rt<r'  rtcr;. 

Another  frequency  of  interest  is  the  frequency 
/  for  which  S(e^")  is  maximized.  .Note  that  in  the 
second  order  case 


S,(e'“)=- 


cos  (It  n-Za-,  cos  2(11. 


The  frequency  at  which  S,ie^")  is  maximum  is 
the  frequency  at  which  1/S,(e^")  is  minimum.  We 
lind  this  frequency  by  setting  the  derivative  to  zero: 

—  (l/S,(eJ“)) 

0OJ 

=  -2  sin  1  -  -^4a;  cos  w] 

=  0.  (49) 

The  points  a»  =  0  and  w  =  r  correspond  to  minima 
of  5,iw).  The  maximum  is  attained  at 


I  -I  /  a|(  1  a;)\ 

'2^*^°®  I  4al  /■ 


In  this  case 


t'la,  f)a-  J 


,52) 

ha  2ir  sin  <5  4a- 

f>a-  2tt  sin  HI  4a: 

Another  spectral  parameter  of  practical  interest 
is  the  bandwidth  of  the  spectrum.  We  will  use  the 


S,(e'“) 


The  numerator  is  the  energy  contained  in  the  posi¬ 
tive  frequencies,  while  the  denominator  is  the  peak 
energy  density.  The  resulting  B  is  normalized  with 
respect  to  0.5  Hz,  ( i.e.,  for  white  noise,  B  =  0.5  Hz). 
Note  that 


-f"s 

Ji) 


(e5“)  da; 


_ 1 1  a-) _ 

2(  1  -  a;)(  I  a--^  a,  )i  I  a-  -  a, )' 


S,teJ“)  = 


1 1  -  a,)'i4a:  -  a',) 


„  ( i -a-)t  I -a-)(4a--aT) 

S=- - ; - .  (57) 

Sa-i  I  f  0;  -r  ai )( 1  a;  —  a, ) 

To  evaluate  the  bound  on  the  error  variance  of 
B  we  need  to  compute 


T  r  h  B  h  B 
Dj  =  — ,  — ,  0.  0  . 

LoOi  na- 


Note  that 


h  log  B  -2a,  I  ^  1 

ha,  4a;  — aj  l-i-a-^'a,  l-ra-  — a," 

(59a) 

a  log  B  I  1  4 

lia-  1-^a-  1-a-  4a- -af 


1  I _ 1 

a-  l-'-a-TB,  l-ra--ai 


.  (59b) 


hB  h  log  B 
I'lB  _  '« log  B 


(60b) 
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Finally,  we  consider  the  signal  power,  defined  as 


P  = 


_l_ 

2ir 


(rj  d: 
a(r)a(;"')  r  ' 


(61) 


For  second-order  AR  process,  the  complex  integral 
yields 


P  = 


_ tyud  +  az) _ 

( 1  -  a;)(  1  a;  -I-  a, )( 1  +  a-  -  ai )' 


(62) 


To  evaluate  the  bound  on  P  we  must  compute 


Dl- 


flP  nP  up  ^ 

na,  i^a;  ncr'^ 


(63) 


Note  that 


h  log  P _ 1  1 

rta-  l+aj-a,  l-t-a^  +  a,’ 

h  log  P  1  ^  1 

rta,  I  +  a-  I  -  a- 

I  1 

l-'-aj  +  a,  l-t-aj-a,' 

thus. 


(64) 


fiP  log  P  ^ 

(65a) 

aP  o  log  P  ^ 

(65b) 

and  tinaily, 

i^  =  ivP. 

htri  o'Z 

(66) 

Insening  D,  in  (38a)  gives  an  explicit  formula  for 
computing  the  CRLB  for  various  spearal  param¬ 
eters  of  practical  interest. 


4.  Some  examples 

In  this  section  we  present  a  few  examples  illus¬ 
trating  the  usefulness  of  the  bounds  derived  in  the 
previous  sections.  Two  AR  models  are  considered: 

S, ;  a(  r )  =  1  -  1 .4r  -(-  0.95z*  ( narrowband) 

S-;  a(z)=  1  -0.45z  +  0.55;‘  (broadband) 

Sifftal  Proce9««nf 


4.  /.  Spectral  bounds 

Using  the  derivative  vector  0,(39)  we  computed 
the  CRLB  for  the  signal-plus-noise  spearum  for 

5,  and  Sj,  at  different  frequencies.  The  true  spec¬ 
trum  and  the  xl  standard  deviation  curves  are 
depicted  in  Fig.  I  and  2.  These  plots  provide  some 
insight  into  the  achievable  spearal  estimation 
accuracy  for  the  given  signal  and  noise  parameters. 


4.2.  Bounds  on  center  frequency,  bandwidth  and 
power 

Using  the  derivative  vectors  in  equations  (SI), 
(58)  and  (63)  we  computed  the  CRLB  for B  and 
P  for  5,  and  5:,  at  different  signal-to-noise  ratios. 
The  results  are  summarized  in  Tables  I  and  2. 

Examination  of  these  tables  reveals  various 
interesting  facts.  In  the  narrowband  case  the  center 
frequency  can  be  estimated  much  more  accurately 
than  bandwidth  and  power.  Note  for  example  that 
at  SNR  =  3  dB  the  relative  accuracy  (i.e.,  standard 
deviation  divided  by  the  mean )  of  /  is  0.8% ,  of  B 
25%  and  of  P  20% .  The  situation  is  similar  in  the 
broadband  case.  However,  the  center  frequency  is 
estimated  less  accurately  than  in  the  narrowband 
case.  For  example,  at  SNR  =  3dB  the  relative 
accuracies  of  /,  B  and  P  are  2.4%,  15%  and  16%. 
This  type  of  behavior  has  been  observed  in  simula¬ 
tion  studies  of  various  parametric  spectral  estima¬ 
tion  techniques. 


Table  I 

Bounds  on  the  standard  deviation  of  tlie  estimates  ol'  /.  B  and 
P.  air)-  l-l.4r-'-0.95r'^  o-;  - 0.04709 ;  ,V  -  1024 


SNR(dB) 

7-0.1224 

B-0.012S1 

/>-  1.0 

0.9440-  10"’ 

0.3153  10-^ 

0.2005 

0 

0,1016  •  10'- 

0.3473  -  10'= 

0.2068 

-3 

0.1128  •  10-- 

0.3970-  10'= 

0.2200 

~b 

0.1309-  10-- 

0.4772  10'= 

0.2482 

-9 

0.1612-  10'^ 

0.6125  -  10'= 

0.3072 

-12 

0.2147  -  10‘= 

0.8528  -  10'= 

0.4278 

-15 

0.3137  lO'- 

0.1299-  10'' 

0.6691 
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Fig.  1.  Sp«cTral  bound  tor  5,.  <r*»0.(H709.  <7;  -O.I,  SNR*I0dB.  .V  *  256. 


Table  Z 

Bounds  on  (he  standard  deviation  of*  (he  estimates  of  /  B  and 

P  air)*l-0.45z“'T0.55j'-:  tr; -0.6384:  jV-1024 


SNRidBI 

7-0.1986 

3-0.1439 

3-1.0 

-3 

0.4792  ■  10"’ 

0.2257  10*' 

0.1 154 

0 

0.6091  •  10'- 

0.2940  •  10" 

0.1607 

-3 

0.3486  10*= 

0.4227  in*' 

0.2493 

-6 

0.1303  ■  10*' 

0.6693  10*' 

0.4224 

-0 

0.2188  ■  10" 

0.1153 

0.7637 

5.  Conclusions 

We  presented  formulas  for  computing  the  CRLB 
for  different  spectral  parameters  of  an  AR-plus- 
noise  process.  The  proposed  formulas  make  it 
possible  to  compute  the  CRLB  without  requiring 
numerical  integration.  These  bounds  provide  a  use¬ 
ful  reference  point  for  the  performance  evaluation 
of  autoregressive  estimation  techniques. 
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It  can  be  checked  by  direct  computation  that 
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GC  =CjG. 

Hence: 

GS  -  C JGSSJ  =  GS  -  CC,SBj 
*G<5-C,5flJ) 


(A4) 


=  Ge,e[  = 


(A5> 


It  follows  that, 

GS=  f  iCjre.^eJlBjr 

m 

Let 


(A6) 


i  =  V  tC,re,ei(Aj}'”. 


(AT) 


where  e,  =[1, 0, . . . ,  0]’^  =  an  n-dimensional  unit 
vector.  U  satisfies 


U~CL'Aj  =  i,e! 
Let  H  be  the  matrix 


(A8) 


W  = 


(A9) 


I 


a,-ij 


Then 

ha,  =  aJh. 

Hence: 

GUH  -  ClGUHA,  =  GUH  -  GC.UAI 
=  G(U-C,UAj)H 
=  Ge,ejH  =  e:„el. 


(AlO) 


(All) 


and  finally, 

GUH=  i  (Cj)'"e;„eI(A,)'’ 


The  numerical  solution  of  the  Lyapunov 
equations  (A2)  and  (A8)  can  be  performed 
efficiently  using  the  algorithms  suggested  in  [16] 
and  [  1 '].  It  is  worthwhile  to  note  that  S  and  U  are 
(nonsymmetric)  Toeplitz  matrices.  Thus,  their 


entries  are  fully  determined  by  their  first  row  and 
column. 

To  compute  { v,}  we  solve  the  Lyapunov  equation 
X-CJfCj=e,e[.  (A12) 


where  X  is  a  Toeplitz  matrix.  The  first  In  terms 
of  {t?([  are  the  entries  of  the  first  column  of  X. 
Higher  order  terms  of  {ct]  are  obtained  from  the 
recursion 


.  =  -1  y,f,-„ 


l^Zn. 


IA13) 


Similarly,  the  first  n  terms  {r,}  are  the  entries  of 
the  first  column  I  or  row)  of  the  Toeplitz  matrix  Y, 
where 


Y-A,YAj=e,el 


(A14) 


Higher  order  terms  of  {r,}  are  obtained  from  the 
recursion 


r,  =  a,r,.„  l»n.  (A15) 

These  formulas  are  explained  in  more  detail  in  [13]. 
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The  modified  Yufe-Weiker  (MYW)  eqiulioM  for  estimal- 
iii(  Ike  .AR  paramefen  of  lo  ARMA  proctm  are  prcsenied  as  a  ipedai 
case  of  la  iaslmmenlal  variaMe  (IV)  aietbod.  The  eooaiMeiicy  aod 
iccoraey  of  the  .AR  parameter  esiimaies  are  ilodicd.  Il  la  ahowa  lhai 
eaiimaiioa  accoracy  iacreaaes  auMUMooically  with  the  aomber  of  MYW 
eRaatioaa  for  aa  opfimai  choke  of  the  areifhliat  matrix  aacd  la  the  leaal- 
squarca  solutioa  of  ihxae  cqaaiioaa.  The  aaympiotk  error  covariaace  of 
the  optimal  IV  aiethod  cqoala  that  of  the  predktioa  error  aMthod.  The 
resulla  of  Ihia  paper  verify  experiaMntal  reaalta  reported  lo  the  literalare 
rexardiat  the  performance  of  Ihe  MYW  method,  aad  provide  Ibe 
neceaaary  accnracy  analysis,  furthermore,  they  suRtest  several  simple, 
asympioiically  efricicnt.  mullisiep  alRorithms  for  esiimailni  Ihe  AR 
paran«cirr!i,  whkh  are  presented  in  a  companion  paper. 

I.  Introduction 

The  need  for  estimating  the  parameters  of  an  autoregressive 
moving-average  (ARMA)  process  arises  in  many  applications 
in  the  areas  of  signal  processing,  spectral  analysis.  ^  system 
identification.  A  computationally  attractive  estimation  ptoc^ure, 
which  has  received  considerable  attention  in  the  literature,  is 
based  on  a  two-step  approach:  first  the  autoregressive  (AR) 
parameters  are  estimated  using  the  modified  Yule-Walker 
(MYW)  equations;  then  the  moving  average  (MA)  parameters  are 
estimated  by  one  of  several  available  techniques. 

In  this  paper  we  consider  only  the  first  step  of  estimating  the 
autoregressive  parameters.  In  many  engineering  applications  the 
second  estimation  step  is  not  need^.  The  prime  example  is  the 
estimation  of  autoregressive  signals  corrupted  by  white  measure¬ 
ment  noise.  In  this  case  all  the  infomution  about  the  spectral 
shape  of  the  signal  lies  in  the  AR  parameters  of  the  signal-plus- 
noise  ARMA  process  (see.  e.g.,  (291). 

The  relative  simplicity  of  the  MYW  estimator  motivated  a 
number  of  authors  to  investigate  this  technique  and  to  develop 
various  extensions  and  variations  [1I-[10].  Most  of  this  work  has 
been  done  in  the  context  of  high  resolution  spectral  analysis.  One 
of  (he  imponant  observations  made  in  these  studies  is  that 
significant  improvements  in  estimation  accuracy  can  be  obtained 
by  increasing  the  number  of  MYW  equations  [2],  [9].  The 
resulting  set  of  overdetermined  equations  is  then  solved  by  some 
least-squares  technique.  The  possibility  of  using  a  weighted  least- 
squares  procedure  was  also  discussed  (see,  e.g..  [1],  [2]). 

Pertbrmance  evaluation  of  (he  MYW  method  has  in  the  past 
been  done  by  simulation.  A  formal  accuracy  theory  appears  to  be 
lacking.  It  is  our  objective  in  this  paper  to  fill  (his  gap  and  provide 
an  asymptotic  accuracy  analysis.  This  analysis  clarifies  the  precise 
role  of  increasing  the  number  of  equations  and  of  including  a 
weighting  matrix.  It  provides  a  valu^le  verification  for  experi- 
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mental  observations  as  well  as  guidelines  for  further  improve¬ 
ments  of  MYW  based  ARMA  estimation  techniques. 

The  MYW  method  is  related  to  the  instrumental  variable  (IV) 
method  of  parameter  estimation  [8],  [11],  [121.  In  Section  11  we 
define  an  IV  estimator  which  is  slightly  more  general  than  the 
MYW  estimators  presented  in  the  literature.  In  Section  III  we 
establish  the  consistency  of  the  IV  estimates  and  develop  an 
explicit  formula  for  the  covariance  matrix  of  the  estimation 
errors.  This  formula  can  be  used  to  evaluate  the  asymptotic 
performance  of  various  .MYW  algorithms  proposed  in  the 
literature  (23).  (30|.  In  Section  IV  we  study  the  optimization  of 
estimation  accuracy  with  respect  to  (he  weighting  matrix  and  the 
number  of  equations.  We  show  the  existence  of  an  optimal  c.hoice 
of  the  weighting  matrix,  which  minimizes  ihe  covariance  matrix 
of  the  estimation  errors.  Funhermore.  we  show  that  the  optimal 
error  covariance  matrix  decreases  monoionicaily  when  the  num¬ 
ber  of  equations  is  increased,  and  converges  as  the  number  of 
equations  tends  to  infinity.  The  form  of  this  limiting  matrix  is  also 
presented,  and  in  Section  V  it  is  shown  that  it  equals  the 
asymptotic  error  covariance  of  the  prediction  error  method.  The 
effect  of  a  certain  filter  used  in  the  generation  of  the  insmimentai 
variables  on  the  convergence  rate  of  the  error  covariance  matnx 
of  the  optimally  weighted  IV  estimate  is  studied  in  Section  VI.  It  is 
shown  that  there  exists  an  optimal  choice  of  this  filter  which  gives 
the  fastest  convergence  rate. 

The  optimal  IV  methods  presented  in  this  paper  can  be  used  to 
derive  several  new  AR  parameter  estimation  algorithms  widi 
improved  accuracy  and  modest  computational  cost.  In  a  compan¬ 
ion  paper  (24)  we'  present  several  such  algorithms,  analyze  their 
asymptotic  properties,  and  evaluate  their  performance  by  simula¬ 
tion. 

Finally,  we  note  that  results  related  to  those  presented  here 
appeared  recently  in  [3I]-(331.  The  problem  considered  in  these 
references  is  die  estimation  of  the  parameters  of  dynamic 
econometric  models  by  IV  methods  with  instruments  that  are  not 
exogenous.  The  approach  used  in  (3 1 1-[33|  is  based  on  a  different 
formalism  from  the  one  used  here. 

U.  The  Estimation  Method 

Consider  the  following  ARMA  process  of  order  (na,  nc): 

A{q-')y{t)~C{q-')ein  (1) 

where  e(t)  is  a  white  noise  process  with  zero  mean  and  variance 
X-,  and 

Afq-')*  1  -t-flii?-'  +  •  •• 

C  (ff  - ')  -  1 -(•  Ci<7  - '  +  •  •  ■  +  -  * 

(7*' -unit  delay  operator  (d '')'(/) -yf/ -  1)). 

The  following  assumptions  are  made: 

AI:  A(z)  »  0  -  |z|  >  1;  C(z)  =  0  -  Izl  >  1.  In  other 
words,  the  ARMA  representauon  (1)  is  stable  and  invertible.  This 
is  not  a  restrictive  assumption  (cf.  the  spectral  factorization 
theorein,  e.g.,  (28)). 


(X)  18-9286/83/1 100- 1066S01. 00  ©  1985  IEEE 
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where 


S-£{[C(.7  - ‘)ZU)11C((7 -'won 


(15) 


with  h  R  defined  by  (6)  and  (11).  ^ 

Theorem  1  can  be  used  to  evaluate  various  choices  of  Q,  G(  ). 
and  m  by  comparing  the  accuracies  of  the  resulting  estimates.  In 
[30]  we  evaluated  P  for  some  low-order  ARMA  systems  and 
various  choices  of  Q  and  m  (with  C(<i ' ')  ■  1).  It  was  observed 
that  accuracy  does  not  increase  monotonically  with  m,  in  contrast 
with  statements  sometimes  made  in  the  literature  on  the  overdeter- 
mined  MYW  equatXHis  [2],  (3).  Furthermore,  it  appears  difficult, 
if  not  impossible,  to  predia  which  ad-hoc  weighting  matrix  Q  will 
lead  to  best  accuracy. 

We  have  also  compared  the  accuracy  of  the  IV  estimate  to  that 
given  by  the  prediction  error  method  (PEM)  (18],  [19],  [26],  for 
some  simple  low-order  systems;  see.  e.g..  Example  1  in  Section 
V  and  the  examples  in  [23].  Recall  that  in  the  Gaussian  case,  the 
PEM  error  covariance  matrix  equals  the  Cramer-Rao  lower 
bound.  The  differences  in  accuracy  between  the  IV  method  and 
the  PEM  were  sometimes  considerable,  indicating  that  the  IV 
estimator  with  ad-hoc  choices  of  Q,  m,  and  G(q  ' ')  is  inefficient 
(in  the  statistical  sense). 

The  questions  raised  above  motivate  the  more  detailed  exami¬ 
nation  of  the  accuracy  aspects  of  the  IV  estimates.  In  particular,  it 
is  of  interest  to  chotxe  Q,  m,  and  C(q  ' ')  so  as  to  increase  the 
accuracy  of  the  IV  estimate  (6).  This  is  discussed  in  Seaions  IV- 
VI. 


rV.  OPTIMtZATION  OF  ESTIMATION  ACCURACY 


The  problem  of  determining  optimal  IV  estimates  in  the  fairly 
general  class  of  estimates  defined  by  (6)  can  be  stated  as  follows. 
Pind  Q<m’  Gap,i.q~')  such  that  the  corresponding 

covariance  matrix  hu  the  property  P  >  P^,  where  P 
coiresponds  to  any  other  admissible  choice  of  Q,  m,  aixl  G.  This 
type  of  problem  was  studied  in  [12],  [13]  for  systems  with 
exogenous  inputs,  such  as  ARM  AX  systems.  The  results  of  [12], 
[13]  cannot  be  applied  directly  to  the  ARMA  problem,  as  is 
explained  in  [30].  Therefore,  we  must  approach  the  accuracy 
opdmization  in  another  way.  As  we  will  see,  the  optimization  with 
respea  to  Q,  m,  and  G{q ' ')  can  be  treated  in  three  disdna  steps. 
We  Stan  with  the  optimization  of  P  given  by  (14)  with  respea  to 
the  weighting  matrix  Q,  for  which  the  following  result  holds. 

Theorem  2:  Consider  the  matrix  P  defined  in  (14).  We  have 


Furthermore,  the  equality  P  ^  holds  if  and  only  if 
SQR  -  R(R  ’’S  -  'R)  -  '(P  '■QR). 

Proof:  It  is  straightforward  to  show  that 


(16) 


(17) 


P  -  P,  -  [(R  '■QR)  -  ‘R  '■Q-  (R  '■5  -  'R)  -  ‘R  '] 

•  S((R’’CR)-'R^e-(R''S-'R)-'R''S-']'’.  (18) 


Since  S  >  0,  (1^  and  (17)  follow.  B 

Note  that  (16)  is  closely  related  to  the  Gauss-Marfcov  theorem 
in  regression  theory  [22].  An  obvious  vny  to  satisfy  (17)  is  to 
set  Q  m  s~'.  in  which  case  P  ■  P*. 

Next  we  consider  the  optimization  of  P„  with  respea  to  m.  In 
Section  VI  (Lemma  2)  we  will  formally  prove  that  for  the 
optimal  choice  of  Q.  estimation  accuracy  increases  monotoni- 
colly  with  m,  i.e.,  P«  ^  P«*i  for  alt  m  ^  na.  As  wn 
mentioned  earlier,  this  is  not  tnie  for  artmtary  choices  of  Q  [23], 
[30]. 

Note  that  the  results  above  are  valid  for  general  IV  estimation 
problems.  The  detailed  structure  of  the  matrices  R  and  5  is  not 
used  anywhere  in  the  proofs.  Note  also  that  for  AR  systems  it  can 


4  *1  r>  6 1^  A  1^  Ae*i 


‘.■'.•-j.V'.'Ll-/.-.'.-.' 


be  shown  that  P**  1  *  P«(/n  S  na)  [21].  However,  for  ARMA 
processes  we  have  in  general  P„  >  P**,. 

Since  P„  is  monotonically  decreasing  and  also  P„  >  0,  it 
follows  that  P„  will  converge  to  a  limit  as  m  tends  to  infinity.  A 
formal  discussion  of  the  convergence  of  P„  is  given  m  Appendix 
B  where  it  is  also  shown  that 


P.-lim  P*-\-[£{<<K/)iA’'(0]£{if(f)P'’(0}]-'  (18) 

where  ^/)  is  the  following  infinite-dimensional  veaor; 


vKO* 


1 


C(<7-') 


e(t-nc-  1) 
eit-nc-  2) 


(19) 


TTie  limiting  error  covariance  matrix  P.  can  be  evaluted  by 
solving  a  certain  discrete  Lyapunov  equation  (see  (A. 5)  in 
Appendix  A  and  (B.17)  in  Appendix  B).  Note  that  P.  is 
independent  of  C(-).  We  will  show,  however,  in  Section  VI  that 
the  choice  of  C(-)  affects  the  “convergence  rate”  of  P„. 


V.  Comparison  of  the  Accuracies  of  the  Optimal  IV 
Method  and  the  Prediction  Error  Method 


The  prediaion  error  method  has  been  studied  widely  in  the 
context  of  system  identification  [18],  [19],  [26].  The  prediction 
error  estimate  of  the  parameters  {a„  c,}  of  an  ARMA  system  is 
obtained  by  minimizing  the  loss  function 


Vs(a\,  •••,  im,  <?i.  •••. 


(20) 


where 


^(0-^.(0. 


C(q-)- 


(21) 


The  prediaion  error  estimate  is  loiown  to  be  asymptotically 
norm^ly  distributed  with  the  following  normalized  covariance 
matrix: 


N 


{«!•  •  •  ff...  C|,  •  ■  1?^} 


K[- WO 


where 


‘  [e(T-  1).  •  •  •,  eU-na)], 

Aiq  ') 


(23) 


C(q-') 


::^[e(/-  1),  •••,  eU-nc)]. 


(24) 


It  is  straightforward  to  show  from'(22)  that  the  normalized 
covariance  matrix  of  the  AR  parameter  estimates  obtained  by  the 
PEM  is  given  by 


N 


PmtAVm  cov  {3}-(Ai-A2/>ai?rj]‘‘.  (25) 


where 


Di,-E{^,(t)r(,J(t)},  I,  y-1.  2. 


06*) 

(26b) 
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The  following  result  states  that  the  optimal  IV  method  has  the 
same  asymptotic  accuracy  as  the  PEM. 

Theorem  i:  Let  and  PpeM  be  the  covariance  matrices 
defined  by  (18)  and  (23)-(26),  respectively.  Then,  under  assump¬ 
tions  A1-A3  P.  =  PpsM- 
Proof:  See  Appendix  A. 

As  was  mentioned  earlier,' in  the  Gaussian  case,  the  PEM  is  an 
efficien*  estimator,  i.e.,  Pp^M  equals  the  Cramer-Rao  lower 
bound  [19],  [22].  We  conclude  therefore  that  the  optimal  fV 
method  is  an  efficient  estimator  for  Gaussian  processes.  If  the 
data  are  not  Gaussian,  then  the  optimal  IV  estimate,  like  the  PE 
estimate,  will  still  give  the  minimum  variance  in  the  fairly  large 
class  of  parameter  estimators  whose  covariance  matrices  depend 
only  on  the  second-order  statistics  of  the  data. 

It  is  interesting  to  investigate  the  rate  at  which  P^  converges  to 
P.  =>  PpEMi  since  in  praaice  the  value  of  m  cannot  be  too  large. 
The  “convergence  rate”  of  P„  is  illustrated  by  the  following 
examples. 

Example  I:  Convergence  of  P„  to  P/.£.w-  Consider  the 
ARMA  processes 

S,  :  (1  -0.8q-‘).v(0  =  (l  -r0.7<7-')e(/) 

S;  .  (1  -  1.5(7 ' '  -^0.7<7  ■■)>'(t)  =  (•  -<7  ■'  +0.2/7  ■')«(/) 

where  in  both  cases  £{e(t)e(s)}  =  S,j{X-  =  1).  For  both  Si  and 
S:  we  evaluated  Ppp^,  and  the  optimal  covanance  matrix  P„,  for 
G{z)  ■  1  and  m  =  na,  na  *  1 ,  •  •  ■  .  The  results  are  shown  in 
Table  I.  where  denotes  the  (i,  y)th  element  of  P„.  Note  that  P„ 
has  e'sentially  converged  for  m  =  13.  It  is  interesting  to  compare 
the  accuracy  of  the  optimal  IV  method  to  that  of  the  basic 
modified  Yule-Walker  method  (m  =  na,  in  which  case  the  choice 
of  Q  is  irrelevant).  The  difference  in  accuracies  can  be  quite 
large.  For  example  in  the  case  of  Si,  the  ratio  of  the  variances  of 
ii  corresponding  to  the  two  methods  is  about  30.  For  higher  order 
systems  the  difference  of  accuracy  between  the  method  may  be 
larger  (see  (30|). 

Example  2:  Convergence  of  P„  to  Ppeyt  Note  that  P„ 
approaches  Ppem  more  or  less  at  an  exponential  rate  (cf.  Example 
1).  To  investigate  the  convergence  rate  in  more  detail  consider  the 
general  ARMA  (1,1)  process 

yU)* -ay{t-l)*eU)  +  cell-i).  (27) 

Assuming  that 

Pm^Pn^-r  0$y<l,  AT > constant.  (28) 

it  seems  reasonable  to  plot  ln{(^.,  -  PpEMlfPrswl  versus  m.  This 
is  done  in  Figs.  1  and  2  for  C(p ' ')  ■  1  and  different  values  of  the 
parameters  a  and  c.  It  can  be  seen  that  except  for  small  values  of 
m,  the  curves  can  be  well  approximated  by  straight  lines.  This 
justifies  the  assumption  in  (28).  It  is  interesting  to  note  that  the 
convergence  rate  depends  strongly  on  c,  and  only  weakly  on  a. 
The  convergence  is  particularly  slow  when  c  is  close  to  -  1  (zero 
near  the  unit  circle). 

Similar  results  hold  for  c  close  to  +  1 .  The  large  variations  in 
convergence  rates  for  different  parameters  of  the  data  motivates 
the  study  of  ways  for  improving  the  convergence  rate,  in  the  next 
section  we  show  how  die  choice  of  C(^*')  affects  the  conver¬ 
gence  rate. 

VI.  The  Optimal  Choice  of  G(q  "  ■) 

In  this  section  we  sh«w  that  the  choice  G(q"‘)  «  l/C*(q”') 
will  ensure  that  optimal  estimation  accuracy  is  achieved  for  a 
finite  m,  in  faa  for  m  *  na.  To  see  this  we  state  the  following 
lemma.  Note  that  in  the  following  calculations  we  will  add  the 
subscript  m  to  R,  S,  z(i),  etc.,  to  emphasize  their  dependence  on 
the  number  of  instrumental  variables. 

Lemma  2:  The  matrices  { /** }  form  a  nonincreasing  sequence. 


TABLE  I 

CONVERGENCE  OF  TO  /*.  =  FOR  5,  AND  Sj 


9  10  to  >P  «0  iBt 


Fig.  1.  The  convergence  of  c  «  -  0.9.  varying  a. 
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i.C.*  ^ nm  ^ 

hold  if  and  only  if 


>  fim-  Funhermore,  all  the  equalities 


RlSmX„ 


for  m^na 


where  Rm,  S„  are  as  defined  by  (11).  (15).  and 


jf«A£  C»(q-')C(ff-') 


Proof:  See  Appendix  C.  S 

It  is  now  easy  to  see  that  the  choice  C(q  ~ «  \/CH.q  ~ '),  will 
satisfy  (29)  and  is.  therefore,  optimal  (although  not  necessarily  the 
only  optirnal  choice).  We  state  this  formally  in  the  following 
theorem. 

Theorem  4:  Let  assumptions  A 1-A3  hold  true  and  consider  the 
IV  estimate  (6)  with  m  =  no  and  G{q'')  =•  WCHq'')  (the 
choice  of  Q  is  irrelevant  in  this  case).  Under  these  conditions  the 
(V  estimate  will  be  optimal  in  the  sense  that  its  asymptotic  (N  -* 
03)  covariance  matrix  equals  P^{  =  Pmim)- 

Proof:  Direct  consequence  of  Lemma  2.  ^ 


VII.  Conclusions 


APTENOa 


Proof  OF  Theorem  3 


Let  ua  introduce  the  following  notation: 


It  is  straightforward  to  show  that 


+  +  ■  ■  ■ 


J^e(r-m-l)j.  (30) 


C{<7-’)e(/) 


-0- 


for  k'^nc*  1. 


®  and  hence 


for  ifc>nc-t- 1 


where  A  is  the  following  companion  matrix  associated  with  the 
polynomial  /4(z): 


■a,  -02 -  -a, 

1  0 


We  presented  a  detailed  analysis  of  the  accuracy  aspects  of  a 
general  IV  method  for  estimating  the  AR  parameters  of  an  ARMA 
process.  The  basic  accuracy  result  (Theorem  1)  is  useful  for 
evaluating  the  performance  bounds  for  the  varibus  MYW  related 
estimation  techniques  discussed  in  the  literature.  See.  for  exam¬ 
ple.  the  discussion  in  [23],  [30]. 

More  importantly.  Throrem  I  can  be  used  to  investigate  the 
existence  of  optimal  IV  methods.  We  derived  a  lower  bound  on 
the  estimation  accuracy  of  IV  estimators  and  presented  methods 
for  achieving  this  bound. 

The  first  method  involved  an  optimal  weighting  matrix  Q  > 
5' and  letting  the  number  m  of  instnunental  variables  increase 
to  infinity.  In  ^is  case  the  choice  of  the  filter  G{q~')  becomes 
unimportant  and  we  may  set  «  1  (see  Theorem  2). 

The  second  method  involved  an  optimal  filtering  operation 
C(q~')  »  \/CH.q~').  In  this  case  the  asymptotic  bound  is 
achiev^  for  m  a  rut,  and  the  choice  of  the  weigting  matrix  Q  is 
unimportant  (see  Theorem  4). 

Furthermore,  we  have  shown  that  the  optimal  IV  method  has 
the  same  (asymptotic)  accuracy  as  the  pr^iction  error  method 
(see  Theorem  3). 

The  methods  discussed  above  suggest  two  new  algorithms  for 
estimating  the  AR  parameten  of  ARMA  models.  These  al¬ 
gorithms  are  discussed  in  some  detail  in  a  companion  paper  (24]. 
Note  that  both  of  these  methods  requite  knowledge  of  certain 
quantities  [such  as  C(q  ' ')]  which  are  not  available  a  priori.  In 
[24]  it  is  shown  that  replacing  those  quantities  by  their  consistent 
estimates  does  not  degrade  the  asymptotic  estimation  accuracy. 

Finally,  we  note  tlw  the  optimal  weighting  matrix  Q  ^  5*' 
(required  by  the  first  method)  can  be  estimated  without  explicit 
estimation  of  the  MA  parameters.  This  is  convenient  in  some 
applications  where  one  needs  only  estimates  of  the  AR  parame¬ 
ters. 


It  follows  from  (A.1)-(A.3)  and  (18)  that 


In  other  words.  Pi'  satisfies  the  following  Lyapunov  equation 
[see  also  (B.  17)] 


Pm' 


Since  <4  is  a  stability  matrix.  (A.5)  has  a  unique  solution,  (see. 
e.g..  [20]).  To  show  that  P„  *  PrtM  it  is  thus  sufficient  and 
necessary  to  show  that  Pfi,^  satisfies  the  same  Lyapunov  equation 
(A.5).  We  do  this  in  the  following  steps.  First  note  that 


.4^,(r)«^,(/+l)-e(f)M, 


u,.[I  0  •••  O]'" 


and  therefore 


ADfiA 


where  Du  is  as  defined  in  (26b). 
Next,  we  imroduce 


•_c,  ....  _c, 

1  0  0 


and  note  that  since  i*  0  the  companion  matrix  C  is 
nonsingular,  and  that 


Ci^(/)-02(/+ l)-e(/)Ni 


I 


K<! 


I 


r,  •••  r^.,] 

-  [x*ii,  -  c«r«  -  c«.(/f*  -  r«)j/x* 


"T' 

«■- 1  •  .  .  J 
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'*'**•*  then  from  (A.  15)-(A.  18)  we  have  that 

«i-[i  0  •••  01'’.  (A.ii)  rr '  1  r  ; 

r,  •••  r^.,] 

We  can  now  wnte  / 

ADtiD.:2'DliA  '■-^AjC'’(CDsjC')  -  'CD^iA  ^ 

-  [x*ii,  -  c«r«  -  c«.(/f*  -  r«)j/x* 

*  (Z)|2 -  X*ii,i«fx^  - ■  '(^IJ -  X*Ui«r)  (A.  12) 

where  ^12,  Du,  Du  are  as  defined  in  (26b).  It  follows  from  the 
matrix  inversion  lemma  chat 

By  using  (A.8),  (A.  12).  and  (A.  13)  we  obtain  after  some  To  walMtt  the  ^nominator  of  the  right-hand  side  of  (A.  14).  we 
straightforward  but  somewhat  tedious  calculations  *“*  oot®*" 


which  gives 


*t\-  DiiDu  Uiat-^  /f* 


n-l  <0-1  .  r_  .  ’  (“1  ~  AjO;,'ui)(U|  -fJi.O.Vttl)'" 

^PEM~-'’^reM-'*  “A  - 1 — ' 

1  -  X-uiD-'u, 

•  22  * 


1  -X-U.'^Oy'l/xa  1  -d  -C«)=*Cw. 


“.22“-  It  follows  that  the  right-hand  side  of  (A.  14)  reduces  to  1/ 

precisely  the  right-hand  side  of  (A.5).  We 
t  •  I  )  have  shown  chat  /*; '  and  obey  the  same  Lyapunov  equation 
According  to  a  well-known  formula  for  the  inverse  of  the  therefore /*. '  *  Pfin- 
covariance  matrix  of  an  AR  process  [27].  we  have 

Appendix  B 


®a'“2*n 


Convergence  of 

(A.  15)  In  this  Appendix  we  consider  the  convergence  as  m  -•  ao  of  the 
inverse  of  the  optimal  error  covariance  matrix 


To  proceed  we  note  the  following  properties  of  the  covariance 


elements  of  D^.  Let 

We  have 

7*+C|1'*-t  +  •  •  •  +C.»T*.» 


^ m  ’^RmSm  Rm  (B.l) 

where  R„  and  5*  are  defined  by  (1 1)  and  ( 15),  respectively.  We 
start  by  introducing  the  following  notation: 

't~E{}rU)  ■  G{q-'Mt-k)}. 


.(Ciq-')  I 


>^t.  for  ail  k. 


T*  +  Cit»«,  +  --  -  +c«T**« 


I-!- 

{Aid-') 


e(f)  •  e(/- 


rx»  kmO 
1 0  k<0. 

If  similarly  to  (A.1)  we  introduce 


Note  that 

/»+«lA:-l  H - -^Onafk-m 

•E{A(g~')y{t)  ■  C(q-')Xt-k)\mO,  k^itc-t-l.  (B.3) 

If  we  let  ^  be  the  companion  matrix  defined  in  (A.4).  then  (B.3) 
implies  that 

RamAfii,.,  Ar>/ic+l.  (B.4) 

Let  us  also  introduce 

m~C{q-')C{q-')yU). 

•  Wt-\)  ■Mt-m)]}. 

(B.5) 

We  can  now  state  the-following  result 
Lemma  A/:  Consider  the  sequence  of  matrices m  ■1,2, 
“*  defined  by  (B.l).  The  following  Lyapunov-type  equation 


pisfiw 
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holds  true 

/*;!>  -^>*-'-4  ’■«-V  (R^-ARl,sV4>m) 

■  {R^-ARlsV4>mV,  m.l,  2.  •••.  (8.6) 

Proof:  First  note  that  according  to  (B.4) 

ARi\. 

Next,  we  have 


and 


VmSm  -  Vm 


Consider  first  (B.12).  We  have 
'At 


+  0(mM"*‘).  (B.13) 


S'' ,  > 


'I'm 

'I'm 


0  0 

0  s;' 


1 


-1 


(-1. 0^5;'i. 


Therefore,  we  can  write 

fi-',,mR!,.^S:[,R„„mAR:,S:.'R^’' 

-r-!r  V*)'' 

which  concludes  the  proof.  ^ 

Next,  we  study  the  limit  as  m  aa  of  the  right-hand  side  of 
(B.6). 

Lemma  82:  Let  m  —  Then,  under  assumptions  A1'A3 


E^Mt)  ■  j  j 

£^(6(/)  •  C(<7-')j^|;  h,y(t-nc-i)-y(t- 

j 


(B.7a) 


=  £U(/)-^, 

-£{«(/)  ■  C(i)‘')y(r-/7c)l 
+  00r"'*'). 

Further  straightforward  calculations  give 

/40(/)  =  «(/+ l)-C(<7-')e(/)w,.  M,= 


(B.I4) 


•  ^^^.v^g(/-/»c-l)j  mdm- 


no. 


(B.15) 


Proof:  Define 

,.o  *C^d-')C(<?-')  ■ 

Due  to  assumptions  A1  and  A3 


(B.7b) 


(Afl-l).  (B.8) 


(B.9) 


Combining  (B.  14)  and  (B.  15).  we  obtain 

■*'  I 


ar: 


La,J 


where  c  is  a  constant,  and  0  <  ^  <  1  is  the  maximum  modulus  of 
the  zeros  of  CHg-')C(q-<).  Now  A{q-')fU)  - 

CHg '  ')C(q '  'MO.  so  that  for  m  large  enough  we  can  write 

j’U) +  *!/(/-  1)T  ■  •  •  TA,jt(/-m)-t-0(M"*')-e(r).  (B.IO) 

It  follows  from  (B.IO)  that 

"a, 


(I’ll.*  -Sm 


Hence 


-t-OOi— '). 


(B.ll) 


-ARi 


(B.16) 

This  equation  mgether  with  (B.12)  implies  (B.7b).  Next  consider 
(B.13).  We  have 

*-[I]  [I  *"“">]] 

-£^>’(/)  ■  l^f;  A,/(r-()-;>(r)  +  0(M"'*')j  j 

-£{>((/)  •  e(r)}-£{J(^f)}+0(M-') 
-X*-£ty*(f)J+0(M"'*') 

which  together  with  (B.5)  and  (B.13)  proves  (B.7a).  Thi.s 
concludes  the  proof  of  Lemma  B2. 

It  is  now  straightforward  to  evaluate  ?m  A  Um_-«  fim.  The 
limit  exisa  since  we  have  shown  earlier  that  ^  i  >  0  (see 
Lemma  2).  Furthermore,  it  follows  from  Lemmas  B1  and  B2  that 
Pz,'  satisfies 


+<Km^‘’*'),  (B.12) 


P:'-Ap-jA^m.^P^Pl. 


(B.17) 


STOICA  u  at.:  OPTIMAL  INSTRUMENTAL  VARIABLE  ESTIMATES 


1073 


As  is  well  known,  under  the  given  assumptions  the  solution  of 
(B.  17)  is  unique  and  is  given  by 

kmO 

In  Appendix  A  we  have  shown  that  ^  /(/?». i.  Therefore, 


1  ■  ■  ’i 

ST 

1 

-£ 


C(<7-‘)C(«-‘) 


Ai-n 

yU~m) 


■  {C{q-')C(q-')[y(t-\)  ■  ■  ■  y{t-m)\a 


where 


*C(q'')G(q-')y(t-m-  1)}] 


0.  a„,  •••,  a,]'’. 

Next,  introduce 


(C.3) 


which  is  precisely  (18). 

APPENDIX  C 
Proof  of  L£.mma  2 
Note  that  we  can  write 

a 


^  =  E{y(t)  ■  Giq-')y{t-k)]-, 


(C.4) 


and  note  that 


/<  +a|/< .  I  +  •  ■  •  +  a,./i 

=  £■{£(<7  ■')e(/)i7(a  ■')/(!-*)}  =0.  k^nc*-l  (C.5) 


it„  =  £{C{<7 -'):„(/)  C{q-')G{q-')yU-nc-m-l)). 


and.  therefore,  that 


a  =  E{C{q-')G{q-')yU)\'. 


+  for  k:^nc  +  na. 


and 


It  follows  from  (C.3)  that 


Em*  I  “ 


d* Is; '  t(-, -«„+/?  ia -£  is; 'x*. 
However,  from  (C.  I)  and  (C.2)  we  have. 


d«»£{d(/)  •  (j(<7*')>'(r-nc-m- 1)>. 


Therefore. 


0 


6 


«  +  +a/<»*»-«»0,  for  m>Na. 


-/»;'+ a„(d.,  -  £is;  V„i(d«  -  £is;  'k]  ^ 


(C.l)  Hence.  (C.2)  reduces  to  (29)  and  the  proof  is  completed. 


where 
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OF  AR  PARAMETERS:  A  MONTE  CARLO  ANALYSIS  OF  FINITE-SAMPLE  PROPERTIES 

Petre  Stoica,  Benjamin  Friedlander,  and  Torsten  Soderstrom 

ABSTRACT 

A  Monte  Carlo  analysis  of  the  accuracy  properties  of  least  squares  (LS), 
Yule-Walker  (YW),  and  the  overdetermined  Yule-Walker  (OYW)  methods  for 
estimating  the  parameters  of  autoregressive  (AR)  processes  is  presented. 
Comparisons  of  the  estimated  finite-sample  accuracy  to  the  theoretical 
asymptotic  accuracy  are  included.  It  is  shown  that  considerable  differences 
may  occur  in  some  cases.  Choice  of  the  number  of  equations  in  the  YW  system 
of  equations  is  discussed.  Some  remarks  concerning  the  feasibility  and 
usefulness  of  an  analytical  study  of  the  finite-sample  accuracy  properties  are 
also  included. 
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1.  INTRODUCTION 


There  are  many  parameter  estimation  methods  in  use  today.  For  most  of 
them,  an  asymptotic  accuracy  theory  is  available.  The  interest  in 
establishing  the  accuracy  properties  of  an  estimation  method  is  motivated  by 
at  least  the  following:  (i)  interval  estimation;  (ii)  hypothesis  testing; 
(iii)  accuracy  comparisons  with  other  estimation  methods;  (iv)  accuracy 
optimization  with  respect  to  some  "design  variables"  which  are  at  the  disposal 
of  the  user.  The  asymptotic  accuracy  theory  has  often  been  used  for  solving 
problems  such  as  those  listed  above.  However,  in  some  cases,  the  asymptotic 
theory  is  not  applicable  for  the  sample  lengths  encountered  in  practice.  In 
recent  years,  three  main  directions  of  research  for  overcoming  this  difficulty 
have  appeared; 

(i)  Analytical  studies  aimed  at  establishing  the  exact  finite-sample 
accuracy  (moments  or  distribution)  of  the  parameter  estimators;  this  turned 
out  to  be  possible  in  some  simple  cases  (a  typical  example  being  the  LS 
estimator  of  the  first-order  AR  parameter).  See  [4,  6,  9,  10,  23]. 

(ii)  Higher  order  approximations  of  the  exact  accuracy  (moments  or 
distribution).  This  approach  proved  more  flexible  than  the  one  above,  yet 
provided  quite  accurate  approximations;  see  [1,  17-22]. 

(iii)  Monte  Carlo  analysis  of  the  finite-sample  accuracy  properties. 

This  is  a  conceptually  simple  and  general  approach;  see  [5,  7,  8,  12,  15]. 

The  aim  of  this  paper  is  twofold:  (i)  To  comment  briefly  on  the  three 
general  approaches  mentioned  above.  This  general  discussion  is  included  in 
the  next  section,  (ii)  To  consider  a  specific  estimation  problem  for 
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illustrating  some  of  the  main  issues  addressed  in  the  general  discussion. 

More  specifically,  the  problem  of  estimating  the  AR  parameters  is  considered, 
and  a  Monte  Carlo  analysis  of  the  accuracy  properties  of  three  methods 
frequently  used  for  AR  parameter  estimation.  Least  Squares  (LS),  Yule-Walker 
(YW),  and  Overdetermined  Yule-Walker  (OYW),  is  presented. 

It  is  perhaps  worth  remarking  that  most  papers  on  small-sample  properties 
have  appeared  in  the  econometric  literature.  A  possible  reason  for  this  is 
the  fact  that  econometricians  deal  more  often  than  engineers  with  short 
samples  (for  example,  containing  around  50  data  points).  However,  as  the 
simulations  of  this  paper  will  show,  significant  discrepancies  between  the 
finite  sample  behavior  and  that  predicted  by  the  asymptotic  theory  may  well 
appear  even  for  sample  lengths  encountered  in  engineering  applications. 

An  outline  of  this  paper  is  as  follows.  A  general  discussion  on 
approaches  to  the  analysis  of  finite-sample  distributional  properties  of 
parameter  estimators  is  given  in  the  next  section.  In  Section  3  we  briefly 
describe  the  LS,  YW,  and  OYW  methods  for  estimating  the  AR  parameters.  Their 
asymptotic  accuracy  properties  are  reviewed  in  Section  4,  where  it  is  also 
shown  that  the  asymptotic  covariance  matrix  of  the  YW  estimator  is  bound  from 
above  by  the  covariance  matrix  of  the  OYW  estimator.  Section  5  contains  the 
results  of  a  Monte  Carlo  analysis.  Finally,  some  concluding  remarks  are 
presented  in  Section  6. 

2.  GENERAL  DISCUSSION 

There  are  at  least  two  points  which  are  of  interest  when  discussing  the 
approaches  mentioned  above:  feasibility  and  usefulness. 
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For  many  estimators  currently  in  use  it  is  a  formidable  if  not  impossible 
task  to  establish  the  exact  finite-sample  properties  of  the  distribution.  In 
some  simple  cases,  this  task  becomes  feasible  but  the  resulting  exact 
expressions  (for  example,  of  the  distribution  moments)  are  so  complicated  that 
their  usefulness  may  be  questioned  (see  [lOj  and  its  references  where  a 
cumbersome  formula  is  given  for  the  finite-sample  variance  of  the  estimated 
parameter  of  a  first-order  AR  process). 

Specifically,  let  us  suppose  that  e  is  the  unknown  parameter  vector 
and  its  estimate  obtained  from  an  N-length  sample.  Introduce  the 
normalized  covariance  matrix  of  the  estimation  errors 

Pn(»)  =  E{(9n-«)(5^  -  9)^[  ,  (1) 

and  let  P:o(9)  denote  the  asymptotic  covariance  matrix 

P„(&)  .  lim  P  (©)  .  (2) 

N  >  »  ^ 

For  many  (consistent)  estimators  currently  used  in  system  identification,  the 
above  limit  exists  under  weak  conditions.  Furthermore,  we  have 

^  0(1/N^^^)  .  (3) 

In  practice,  when  using  P«^(9)  or  P„(9)  for  purposes  such  as  interval 
estimation  or  hypothesis  testing,  we  have  to  replace  ©  by  ©j^.  Since 

9^  -  9  =  0(1/N^^^) 


we  have 


=  P^(9)  +  0(1/N^'^)  . 


On  the  other  hand,  from  (3), 


Thus,  there  is  apparently  no  guarantee  that  Pnj(9fj)  is  a  better  estimate 
oT  P|^j(9)  than  is  P^(9fj).  The  above  discussion  is  valid  for  N 


sufficiently  large.  For  "small"  N,  the  above  calculations  are  no  longer 


valid.  However,  since  Pf^(9)  has  a  more  complicated  expression  than 
P^(9),  it  may  still  be  true  that  replacement  of  9  by  9|y|  may  in  some 
cases  lead  to  larger  errors  for  P[^(*)  than  for  P^(*). 

Next  consider  the  problems  of  accuracy  comparisons  with  other  estimation 


methods,  and  accuracy  optimization  with  respect  to  some  "design  variables" 


which  are  at  the  disposal  of  the  user.  For  many  estimation  methods,  there 


exist  asymptotic  results  for  both  the  optimization  of  accuracy  and  for 


comparison  with  the  accuracy  achieved  by  other  estimation  methods.  However, 


these  results  may  fail  to  apply  for  the  sample  lengths  encountered  in  practice 


and  are  thus  of  little  use  in  such  situations.  For  example,  asymptotically 


equivalent  estimation  methods  have  been  shown  to  behave  quite  differently  in 


the  finite-sample  case  (see  [5]  and  Section  5  of  this  paper).  A  considerable 


departure  from  asymptotic  theory  was  reported  in  [15],  where  it  was  shown  by 


extensive  Monte  Carlo  simulations  that  in  some  cases  the  ordinary  LS  estimator 


may  be  better  than  the  idealized  Markov  estimator  in  terms  of  both  bias  and 


variance.  Since  Pr^(9)  will  in  general  have  a  complicated  expression,  it 


is  unlikely  that  analytical  comparisons  and  optimizations  of  accuracy  would  be 
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possible  in  the  finite-sample  case.  However,  what  should  be  possible  is  to 
evaluate  P[\j(e)  numerically  for  different  N  and  e.  This  may  serve  to 
identify  sets  in  the  parameter  space  and  values  of  N  for  which  one 
estimation  method  is  better  than  another,  and  also  to  provide  guidelines  for 
‘‘optimally"  choosing  the  design  variables  defining  the  estimation  method  in 
question.  The  Monte  Carlo  analysis  approach  addresses  the  two  objectives 
mentioned  above.  The  Monte  Carlo  approach  provides  only  an  estimate  of 
(or  of  the  distribution  function).  The  larger  the  number  of 
replications  used  in  the  Monte  Carlo  experiment,  the  better  will  be  this 
estimate.  Furthermore,  a  Monte  Carlo  analysis  may  be  quite  costly  in  terms  of 
the  computer  time  involved.  However,  when  an  expression  for  P|yj(©)  is  not 
available,  the  Monte  Carlo  analysis  may  be  the  only  solution  at  hand.  The 
Monte  Carlo  analysis  may  also  be  the  preferred  approach  when  the  evaluation  of 
the  available  expression  for  Pnj(e)  requires  a  very  cumbersome  algorithm 
(see  [3]).  Extensive  Monte  Carlo  analyses  for  evaluation  of  various 
instrumental  variable  methods  are  given  in  [27,28]. 

Finally,  the  development  of  higher  order  approximations  for  estimator 
accuracy  seems  to  be  the  most  promising  one  from  a  theoretical  point  of  view. 
Essentially,  it  follows  the  lines  of  the  asymptotic  analysis  but  takes  into 
account  also  some  higher  order  terms  in  (3).  Truncating  asymptotic  series 
expansions  after  a  small  number  of  terms  is  frequently  used  to  get  improved 
approximations  of  parameter  estimate  distribution  or  of  its  moments.  A 
different  approach  to  approximate  analysis  of  finite-sample  distribution  was 
recently  proposed  in  [20]. 
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In  many  situations,  the  development  of  approximations  is  a  more  feasible  [ 

theoretical  approach  than  the  development  of  exact  formulas.  Also,  it  should  1 

lead  to  more  manageable  expressions  for  the  covariance  matrix  of  the 
estimation  errors,  etc.  We  believe  that  this  approach  is  a  topic  that 
warrants  more  attention.  Some  recent  results  on  the  finite-sample  covariance 
structure  of  the  sampled  covariances  of  ARMA  processes  (see  [2,  3])  might  be 
useful  in  this  context  (at  least  for  studying  the  so-called  correlation-based 
techniques).  We  may  also  remark  that  Monte  Carlo  simulation  results  may  be 
useful  when  deriving  approximate  finite-sample  properties  of  the  distribution 
by  using  the  analytical  approach  of  [20]. 

In  the  next  section,  we  will  consider  three  methods  for  estimating  the  AR 
parameters.  Even  if  estimating  the  AR  parameters  is  apparently  one  of  the 
simplest  dynamic  estimation  problems,  an  exact  finite-sample  accuracy  theory 
does  not  seem  to  be  available  for  any  of  the  methods  considered.  An  analysis 
of  the  finite  sample  properties  is  beyond  the  scope  of  this  paper.  Instead, 
we  resort  to  Monte  Carlo  analysis  to  show  that: 

(i)  The  asymptotic  and  finite-sample  accuracy  properties  may  be  quite 
different  in  some  cases,  (ii)  The  number  of  YW  equations  used  for  estimation 
has  a  considerable  influence  on  the  accuracy.  (Some  guidelines  for  choosing 
that  number  are  discussed.)  (iii)  The  LS  method  performs  in  most  cases 
better  than  the  other  two  methods  tested. 

3.  ESTIMATION  METHODS 

Consider  the  following  general  AR  process 

y(t)  +  a^y(t-l)  +  ...  +  a^y(t-n)  =  e{t),  (4) 
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where  [a(t)  j  is  a  sequence  of  independent  and  identically  distributed  random 
variables  with  zero  mean  and  variance  denoted  and  the  real 

coefficients  ja^^  are  such  that  the  polynomial 

A(z)  =  1  +  a^z  +  . . .  +  a^  z'’,  (5) 

has  all  its  zeros  outside  the  unit  circle. 

The  AR  model  (4)  is  used  in  many  applications  in  engineering, 
econometrics,  biometrics,  geophysics,  etc.  and  a  number  of  methods  are 
available  for  estimating  its  parameters.  Of  these,  perhaps  the  most  commonly 
used  ones  are  the  following  three. 


3. 1  The  LS  Method 

Let  e  denote  the  vector  of  unknown  parameter 
9  =  ...  a^]^ 

The  LS  estimate  of  e  is  defined  as 

T  2 

9  =  arg  min  -  Cy(t)  -  c?  (t)  9]  , 

9  t=n+l 


(6) 


(7) 


where 


'-Ht)  =  [-  y(t-l)  ...  -  y(t-n)]^  (8) 

After  some  straightforward  calculations,  (7)  produces  the  result  [29] 

e  =[  :  c-lt)  .^'“(t)]  ^  c  :  C7(t)  y(t)  j  .  (9) 

t=n+l  t=n+l 


6368b 


8 


The  inverse  in  (9)  exists  at  least  for  large  N. 

3.2  The  YM  Method 

As  can  be  easily  seen  from  (4),  the  coefficients  { a^ f  satisfy  the 
following  equations: 

'^k  '"k-1  *•*  ^n  ''k-n  *  >  I*  (10) 

where 

>"14  =  E]y(t)  y(t+k)}  , 

and  where  E|*}  denotes  expectation. 

Equations  (10)  are  the  so-called  YW  equations  and  the  estimate  obtained 
after  replacing  £ ^  by 

N-k 

^k  =  ■J-  -  y(t)  y(t+k),  r_^  =  r,^,  k  =»  1,  2 .  (11) 

t  “  1 

in  the  first  n  equations  of  (10)  is  called  the  YW  estimate.  Thus,  the  YW 
estimate  of  e  is  given  by  [27] 


Numerically  efficient  algorithms  for  solving  the  linear  system  (12)  exist. 
For  example,  the  Levinson-Ourbin  algorithm  solves  (12)  in  O(n^)  arithmetic 
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operations.  The  Toeplitz  structure  of  the  matrix  in  (12)  makes  the  YVI  method 


more  efficient  numerically  than  the  LS  method  (9).  (Equation  (9)  needs 


approximately  n/2  times  more  multiplications  than  (12).)  The  LS  estimate 


(9)  and  the  YW  estimate  (12)  are,  however,  asymptotically  equivalent.  For 


large  N  we  have 


®LS  “  ®YW  °  • 


This  result  can  be  readily  established. 


3.3  The  Overdetermined  YW  Method 


The  OYW  method  is  based  on  the  recognition  of  the  fact  that  the 


Yule-Walker  equations  (10)  involving  high  lag  coefficients  (r|^,  k  >  n) 
should  be  considered  when  estimating  the  parameters  of  (4).  Then, 


instead  of  (12),  one  obtains  an  overdetermined  system  of  equations  which  is  to 


be  solved  in  a  least-squares  sense.  The  OYW  estimate  is  thus  given  by 


A 

.  .  .  r. 


where  1|  xjjg  .  x^Qx,  and  Q  is  a  positive  definite  weighting 

matrix  of  dimension  mxm.  A  numerically  stable  procedure  for  solving  (14)  is 


the  QR  algorithm. 


Intuitively,  we  expect  that  the  additional  equations  in  (14)  will  improve 


the  estimation  accuracy,  unless  the  sequence  of  covariances  r|^  dies  out 
rapidly.  In  other  words,  for  narrowband  processes  (14)  with  a  relatively 
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large  m  should  be  preferred  to  (12),  while  for  broad-band  processes  (12)  may 
be  preferable.  The  choice  of  m  is  discussed  in  some  more  detail  in  Section 


5.  We  generally  set  Q  «  I. 


The  above  conjectures  pertaining  to  the  choice  of  m  are  supported  by 


practical  experience  with  the  method  (see,  for  example,  the  simulation  results 
in  Section  5).  The  practical  experience  contradicts  once  more  the  asymptotic 


theory.  See  the  next  section  where  it  is  shown  that  (12)  is  asymptotically 


more  accurate  than  (14),  for  any  m  >  n. 


4.  ASYMPTOTIC  DISTRIBUTIONS 


The  LS  estimate  e^s  asymptotically  normally  distributed  with  mean 
equal  to  the  true  parameter  vector  a  and  covariance  matrix  given  by  [27], 


In  view  of  the  equivalence  (13),  the  YW  estimate  ayu  the  same 


asymptotic  distribution. 


It  follows  from  [24]  that  the  OYW  estimate  anyu  is  asymptotically 


normally  distributed  with  mean  a  and  covariance  matrix  given  by. 


2 

p  =  ^  (rV)'^R^QSQR(rV)'^ 


where 


0  • 
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and 


The  relation  between  the  covariance  matrices  P  and  P  is  of  interest.  The 
following  result  holds. 

Lemma.  Consider  the  covariance  matrices  P  and  P  defined  by  (15)  and 
(16),  respectively.  Then, 

■P>P  (17) 

Proof.  See  the  appendix. 

The  results  in  this  section  are  valid  for  a  "sufficiently  large"  N. 

What  constitutes  a  sufficiently  large  N  depends  on  the  f  a-j  parameters, 
or  more  precisely,  on  the  location  of  the  zeros  of  the  polynomial  A(z).  This 
is  illustrated  in  the  next  section. 

5.  MONTE  CARLO  ANALYSIS 

In  this  section  ,  we  report  the  results  obtained  for  the  following  two 
second-order  AR  processes: 

S-^:  (1  -  0.9q~^  0.2q"^)  y(t)  *  e(t),  (18a) 
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and 
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I- 

“i- 


1 


Si 


■ 

4  * 


^  * 


$2:  (1  -  1.75q"^  +  0.76q"^)  y(t)  =  e(t) 


(18b) 


The  poles  of  are  located  at  0.4  and  0.5;  those  of  $2  are  equal  to 
0.8  and  0.95.  For  each  system,  50  independent  realizations  of  2000  data 
points  each  have  been  generated.  The  noise  sequence  {e(t)^  was  obtained 
using  the  pseudo-random  number  generator  NORMAL  included  in  the  statistical 
library  of  the  FELIX/ IRIS  computer.  NORMAL  generates  independent  normal 
variables  with  zero  mean  and  unit  variance.  The  initial  values  required  to 
start  the  recurrent  calculations  in  (17)  and  (18)  were  simply  set  to  zero. 

The  first  N  samples  of  each  realization,  with  N  =  100,  300,  500,  and 
2000,  have  been  used  to  estimate  the  system  parameters.  The  LS,  YW,  and  OYW 
methods  briefly  described  in  Section  3  have  been  used  to  get  parameter 
estimates.  The  OYW  method  has  been  applied  for  various  values  of  m  (see 
(14)). 

Let  aj^  denote  the  estimate  of  aj^  obtained  from  the  i-th  data 
realization  by  using  one  of  the  three  methods  under  consideration.  The 
following  quantities  have  been  evaluated  (for  k  =  1,  2). 


1 


50 

i=l 


(mean  value  of  a^). 


(percentage  bias  of  aj^). 


var(a|^) 


1_ 

50 


50 

i=l 


(variance  of  aj^). 


i 


S 


V* 


i 


LI 

V'i 
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MSEldj^)  =  varldj^)  +  [Ti^  -  (mean  square  error  of  i,^) . 

The  results  obtained  in  the  different  cases  are  displayed  in  Figures  1  through 
6.  The  asymptotic  values  of  var(a|^)  are  also  shown  (in  Figures  3  and  6). 

(The  same  symbols  are  used  for  the  Monte  Carlo  and  asymptotic  results.  The 
asymptotic  results  are  the  ones  not  connected  by  straight  lines.)  For 
5(ai(),  the  asymptotic  value  is  zero.  The  following  remarks  can  be  made 
regarding  these  results. 

(1)  For  the  LS  method,  asymptotic  theory  holds  quite  well  for  all  the 
sample  lengths  considered,  for  both  Sj  and  for  $2.  For  the  YW  and  OYW 
methods  the  situation  is  different.  For  S^,  asymptotic  theory  is 
applicable  for  reasonably  short  sample  lengths  (e.g.,  for  N  =  300).  However, 
for  $2,  a  good  agreement  between  finite-sample  and  asymptotic  behavior  was 
found  only  for  very  long  sample  lengths  (N  *  2000).  For  short  sample  lengths, 
considerable  differences  between  asymptotic  theory  and  practical  behavior 
occurred,  especially  for  the  YW  method.  For  sample  lengths  of  100,  300,  and 
500,  the  YW  method  is  by  far  the  least  accurate  of  those  tested,  despite  the 
fact  that  the  asymptotic  theory  recommends  it  as  being  the  best.  For  the  OYW 
method  with  m  =  20,  30,  or  40,  the  differences  beteen  asymptotic  theory  and 
practical  results  are  not  so  large  as  for  the  YW  method  (e.g.,  for  m  =  20  and 
30,  the  estimated  and  asymptotic  values  of  the  variances  are  in  agreement 
for  N  2  300).  It  is  interesting  to  note  that  for  large  m  (e.g.  m  =  40), 
the  finite-sample  variances  may  be  smaller  than  the  corresponding  asymptotic 


(2)  The  LS  method  outperforms  the  YW  and  OYW  methods.  It  gave  the 
smallest  MSE's  in  almost  all  the  experiments  performed.  In  most  cases  the  LS 
method  is  superior  to  the  YW  and  OYW  methods  in  terms  of  both  bias  and 
variance  of  the  parameter  estimates.  The  superiority  of  the  LS  method  over 
the  YW  and  OYW  methods  is  clear  in  the  case  of  S2.  For  S^,  the  LS 
method  and  the  YW  method  gave  quite  similar  results. 

The  ranking  of  the  OYW  methods  (m  _>  na)  appears  to  be  in  accordance 
with  the  asymptotic  theory  only  for  Sj^.  For  this  system,  m  =  2 
(corresponding  to  the  YW  method)  gave  the  best  results;  when  m  was  increased 
beyond  2,  the  estimation  accuracy  deteriorated.  For  S2,  the  choice  of  m 
to  get  "best"  accuracy  is  no  longer  so  clear.  Here,  the  "optimal" 
finite-sample  value  of  m  is  certainly  larger  than  the  asymptotically  optimal 
value  m  =  2.  This  was  also  the  conclusion  of  a  large  number  of  empirical 
studies  reported  in  the  signal  processing  literature.  It  is  difficult, 
however,  to  give  precise  rules  for  choosing  m.  In  loose  terms,  the  closer 
the  system  poles  are  to  the  unit  circle,  the  larger  should  be  m.  For  a  given 
system,  the  "optimal"  value  of  m  depends  on  N.  The  larger  N  the  smaller 
should  be  m  (see,  for  example.  Figure  ^). 

6.  CONCLUSIONS 

We  presented  a  Monte  Carlo  analysis  of  the  accuracy  properties  of  several 
methods  for  estimating  the  parameters  of  an  autoregressive  process.  The 
differences  between  finite-sample  accuracy  and  the  theoretical  asymptotic 
accuracy  were  discussed.  These  results  provide  some  useful  insights  into  the 
behavior  of  these  estimators. 
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APPENDIX:  PROOF  OF  THE  LEMMA 


Let 


P 


(R^  S  ^  R)“^ 


(A.l) 


It  is  straightforward  to  show  that 
_  .2 


P  -  P  = 


N“ 


(rV)"^  r'''q  -  (r'''s‘^R)“^  r'^S"^]  s  [(r'^'or)"^  r'^q 


-  (R^S  ^R)"^  R^S'^]^ 


(A.2) 


It  follows  that  P  >P. 

To  conclude  the  proof,  we  next  show  that  P  _>  P.  This  is  equivalent  to 
showing  that 

£  ;^{t)  C7''‘(t)  -  r’’’  s’^  R  >  0 


which  in  turn  is  equivalent  to 


E  Mt)  c;’^(t)  -R' 


!S  J 


y(t-i) 


y(t-n) 

y(t-i) 

y(t-m) 


[y(t-l). 


y(t-n)  y(t-n), 


y(t-(n)]l>  0 


f 
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ABSTRACT 

The  paper  presents  an  algorithm  for  efficient  recursive  computation  of 
the  Fisher  information  matrix  of  Gaussian  time  series  whose  random  comoonents 
are  stationary,  and  whose  means  and  covariances  are  functions  of  a  parameter 
vector.  The  algorithm  is  first  developed  in  a  general  framework  and  then 
specialized  to  the  case  of  autoregressive  moving-average  processes,  with 
possible  additive  white  noise.  The  asymptotic  behavior  of  the  algorithm  is 
explored  and  a  termination  criterion  is  derived.  Finally,  the  algorithm  is 
used  to  demonstrate  the  behavior  the  exact  Cramer-Rao  bound  for  some  APJ-1A 
processes,  as  a  function  of  the  number  of  data  points.  It  is  shown  that  for 
processes  with  zeroes  near  the  unit  circle  and  short  data  records,  the  exact 
Cramer-Rao  bound  differs  dramatically  from  its  common  approximation  based  on 
asymptotic  theory. 
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1.  INTRODUCTION 


A  general  tinie  series  {y^}  can  be  decomposed  as 

(1) 

where  {m  }  is  a  deterministic  sequence  and  {x  }  is  a  zero-mean  random 

V  U 

sequence.  In  this  paper  we  consider  time  series  whose  random  components 
{x^}  are  stationary  Gaussian  processes.  The  joint  probability  density  of  N 
consecutive  data  points,  say  {y^,  y^^,  y^^  ,  is  given  by 

fil)  =  (2^)‘^^^[det  Rr^''^exp{- |{y.ml\"^[y.ml},  (2) 


where 


y  =  Cyo’^i' 


T 


and  R  is  a  Toeplitz  matrix  whose  elements  are  the  covariances  of 


i  .e. 


'■l.j  = 


0  <  i ,  j  <  N-1. 


(3) 


We  now  specialize  our  discussion  to  the  case  where  the  sequences 
{m  ,  m  ,  and  {r  r  ...  }  are  functions  of  an  M-dimensional  vector 

U  1  U  I 

9  .  Such  time  series  are  said  to  be  parametric,  and  0  is  called  the 
parameter  vector. 

Parametries  Gaussian  time  series  are  very  common  in  many  statistical  and 
engineering  applications.  As  examples  we  mention  autoregressive  (AR)  and 
autoregressive  moving-average  (AP.MA)  processes  [1].  A  problem  of  considerable 


interest  in  oarametric  time  series  analysis  is  ttiat  of  estimating  the 
oarameter  vector  9  from  a  set  of  N  consecutive  measurements.  As  is  well 
known,  the  variance  of  any  unbiased  estimate  9  is  bounded  from  below  by  the 
inverse  of  the  Fisher  information  matrix,  i.e.. 


£{3} 


Var{a}  >  J"^(9)  , 


[4) 


where 


3logf(y)  3logf(y) 


36 , 


,  1  <  k ,  i  <  M 


(5) 


For  Gaussian  time  series,  the  Fisher  information  matrix  is  given  by  the 
expression 

1  1  n I  \  t  nl.\  3m(9)  _  ,  3m(9) 

(JO)),  ,  .  i  tr(R-'(9)  JEM,.  ,^]V1(9)[ 


-i 


3e  J  ’ 


(6) 


where  tr{.}  denotes  the  trace  operator.  While  formula  (6)  is  known,  its 
proof  does  not  appear  to  be  readily  available  in  the  literature.  We, 
therefore,  provide  a  proof  of  this  formula  in  Appendix  A. 

When  the  mean  vector  m  is  zero  (or  is  independent  of  9  ),  and  when  the 
number  of  data  points  is  sufficiently  large,  the  information  matrix  can  be 
approximated  by  'Whittle's  asymptotic  formula  [2] 


N(3)! 


St  (ti) ) 
39. 


34 

39 


ik,z  “  1I7  i 


du)  , 


(7) 
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where  is  the  power  spectral  density  function, 

m 

d)((D)  »  r.  +  2  ?  r,  cos  ko,  .  (8) 

“  kSl  " 

The  use  of  Whittle's  asymptotic  formula  is  quite  common  in  time  series 
analysis.  In  particular,  for  ARMA  processes  this  formula  yields  a  relatively 
simple  closed-form  expression  -  see  e.g.  [1,  p.  240-242].  However,  the 
quality  of  this  approximation  (7)  depends  heavily  on  the  nature  of  the  process 
and  on  the  number  of  data  points,  and  may  yield  highly  erroneous  results  if  N 
is  not  sufficiently  large. 

Direct  computation  of  (6)  (assuming  that  the  sequences  {m^,  m^^,  ...}, 
{rQ,  rj^  ,...  }  and  their  partial  derivatives  are  known)  requires  a  number  of 
operations  proportional  to  .  In  some  cases  it  is  desired  to  compute  the 
values  of  Ole)  for  all  1  <  n  <  N  ,  in  which  case  the  total  number  of 
operations  is  proportional  to  .  This  is  probably  one  of  the  reasons  why 
the  exact  formula  (6)  is  not  widely  applied. 

In  this  paper  we  derive  an  algorithm  for  recursive  computation  of  the 
Fisher  information  matrix.  The  algorithm  computes  the  information  matrices 
for  all  1  <  n  <  N  in  a  number  of  operations  proportional  to  .  Thus,  the 
algorithm  is  considerably  more  efficient  than  the  direct  use  of  formula  (6). 
The  algorithm  is  based  on  the  well-known  Levinson-Durbin  algorithm  for 
computing  the  orthogonal  polynomials  of  a  Toeplitz  matrix. 

The  general  algorithm  is  derived  in  section  2  of  the  paper.  In  section  3 
we  specialize  it  to  some  common  rational  parametric  models.  In  section  4  we 
discuss  the  asymptotic  behavior  of  the  algorithm  and  give  termination 
criteria.  In  section  5  we  illustrate  the  use  of  the  algorithm  by  some 
examples.  It  is  shown  that  the  exact  CRB  differs  dramatically  from  the 


asymptotic  CRB  in  some  cases. 

The  relative  computational  efficiency  of  the  algorithm  described  here 
makes  it  possible  to  use  the  exact  CRB  for  performance  evaluation  of  ARMA 
estimation  algorithms.  The  exact  CRB  provides  a  very  useful  reference  point 
for  studying  and  comparing  various  estimation  procedures  proposed  in  the 
li  terature.  The  fact  that  in  some  practical  examples  the  exact  CRB  differs 
considerably  from  the  asymptotic  CRB  motivates  the  use  of  the  algorithm 
proposed  here,  rather  than  using  the  somewhat  simpler  asymptotic  formulas. 
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V 

-.  '.-  s'  s'  s'u  .I-.-  's 


2.  THE  ALGORITHM 


Let  the  partial  derivatives  of  {r  }  and  {m  }  be  denoted  by 
^  n  n 


sr„(e) 


3m, 


’n,lc 


n,k  36 


^  ;  1  <  k  <  M 


(9) 


T'he  values  of  {^  .  s  .  ,  u  .  ;  0  <  n  <  N-1,  1  <  k  <  M}  are  assumed  to 

n  n  n,K  n,K 

be  available  to  the  algorithm. 

Let  p  .  0  .  and  V  .  denote  the  vectors 


in  “  ’*2’  **•’  '"n^ 


*  £h 


,k  *  '•h.k*  ^2,k . ®n,k^ 


CUf,  ^ .  u,  ^ ,  . . . , 


in,k  ’  u^^^ 


Let  R  and  S  ,  be  the  (n+l)  x  (n+1)  Toeplitz  matrices 
n  n,k 


(R  1  =  r  ;  (S  )  =  s .  .  .  ;  0  <  i , j  <  n  . 

'^n’i.j  i-j  ’  '  n,k'i,o  i-j.k 


(10) 


Let  a  be  the  (n+l)-dimensional  vector 
— 


1 


}  1 
I  ^ 


(11) 


The  components  of  ^  are  the  coefficients  of  the  n-th  orthogonal  (so-called 
Levinson-Szego  polynomial  of  the  sequence  {Tq,  rj^,  ...}  [3], [4]. 

Let  I  denote  the  permutation  matrix 


The  dimension  of  i  will  be  always  clear  from  the  context.  Also,  for  any 
vector  V  we  denote 


V  =  I  V  , 


i.e.,  _v  is  obtained  from  ^  by  reversing  the  order  of  the  components  of  the 

vector.  Note  the  following  property  of  the  matrices  R  and  S  ; 

n  n,k 


I  Rn  I  *  Rn 
n  n 


.  I  Sn  .  I  *  S  .  . 

n  j  K  n  y  ic 


Let  us  partition  the  matrix  in  two  ways,  as  follows 


0  £n  ^  ^ 


in  " 


Using  the  well  known  partitioned  matrix  inversion  formula  [5,  pp.  ],  we  get 


0  0  0 

+  ^-1  T  ,  -1-  -  T 

0  R’^^  "  0  0  'n  Sn  Sn  . 


where 


-C  '  •  '  -s.-  / 


“  '"O  '  £n  '^n-Lin  =  ^0  "  £n  '^n.l£n  ' 


Consider  now  the  (lc,z)-th  element  of  the  Fisher  information  matrix  J  (0) 

n+1 

corresponding  to  the  n+1  measurements  {yg,  . y^}  .  Using  (6)  and  (16), 

we  can  expand  this  element  as  follows: 


[Jn+i^3)lv  ,  *  I  tr(R‘4„  I  +  ^R‘^v 

n+1  2  I  n  n,k  n  n.^i  ^,k  n 


0  0 


0  0 


0  R-'  0  R-1 


"  7  ^n.k 


0  0 

0  n-1  ^n,z' 

^  ^n-1 


0  0 

0  p-l  Vki, 

^  ^n-1 


*  7  ^  ^n.k  2n  J  ^n.il 


a 


I 


r"^ 

n-i 


n  jK  Ml  Ml  -^,1 


CJn^3^-V  c  *  t 

n  K  ,i  fi  Ml  n  ,K 


0  0 


0  R 


-1 

n-1 


^n,ji  ^ 


.  1  -2  T  T  ..-IT-  -T 

7  ^n  ^  n.kin  ^  n.x  2L<i  *^0  -^.k-^  ^,i 


-  [  'Jf%(9^3v  ,  ■*■  If  5  2n  "  T  ^r\^2n  ^n  V  5n  Sn  tin 

fi  K » i  n  — n  j  K  n  n  ^  jt  Wi  c  n  *^1  n  ^  k  — ^  n  ^ 


-1  T  -  -T 

^n  -^.k  ^  ^  ■^.z  * 


(18) 


Let  us  introduce  following  auxiliary  variables. 


^n.k  ^ 

« 

(19a) 

=  R'^S  ^ 

Cl  —  R  n  It 

(19b) 

’n,k 

n  n  ,k 

in  n  Mi,k 

^n,k 

T  - 

*  . 
Ml  n  ,K 

T 

in  ■  in  Jln.k  ’ 

(19c) 

^n,k  ”  ,k  ‘ 


(19d) 


To  summarize,  the  algorithm  consists  of  formulas  (23),  (25),  (22),  (21),  (27), 

(24),  (26),  (19c),  (19d)  and  (18)  in  the  given  order.  For  convenience,  we 

have  included  a  summary  of  the  algorithm  in  appendix  B,  in  a  form  readily 

adaptable  for  programming.  The  total  operation  count  for  one  update  is 

n(H^  +  €M  +  4)  +  (M^  +  +  1)  multiply/divide  operations,  and 

n(M^  +  6  M  +  4)  +  (3r-1  6  M  +  3)  add/subtract  operations.  Thus,  the  total 

operation  count  for  computing  {J^(9).  1  <  n  <  N}  is  ^(M^  +  6  M  +  4)M^ 

+  j  (3M^  +  10  M  +  6)N  multiply/divide  oeprations,  and  ^(M^  +  6  M  +  4)N^ 

1  2 

+  j(7M  +  18  M  +  10)N  add/subtract  operations.  This  does  not  include  the 
computation  of  {r^,  s^  j^,  m^,  u^  ,  which  depends  on  the  specific 
parametric  model. 


3.  RATIONAL  PARAMETRIC  TIME  SERIES 


V- 

LV, 


i 


‘A 


A 


In  this  section  we  present  an  algorithm  for  computing  the  covariances  of 
rational  parametric  models,  and  their  partial  derivatives.  We  consider  the 
following  general  rational  model: 


(28) 


where  {v  }  is  white  Gaussian  noise  with  zero  mean  and  variance  ,  and 

w  Y 

z^  is  a  (p,q)  ARMA  process, 


't  \^t-k  “t  ^k“t-k’  ’ 


(29) 


where  {u  }  is  white  Gaussian  noise  with  zero  mean  and  variance  •  The 

L  U 

random  processes  {u^}  and  {v^}  are  assumed  to  be  uncorrelated.  The  parameter 
vector  is 


9  *  CoT.  t  a. 


2 

'u* 


''i 


M  *  p  +  q  +  2 


(30) 


The  model  defined  by  (28),  (29)  includes  many  common  rational  mocels  as 
special  cases.  The  case  =  0,  q  *  0  corresponds  to  a  pure  AR  process. 


p 

while  the  case  <  0  ,  q  *  0  corresponds  to  an  AR  processes  in  additive 


p 

white  noise.  The  case  p  =  0,  =  0  corresponds  to  a  pure  MA  process,  while 


the  case  p  *  0,  q  js  0  corresponds  to  an  ARMA  process.  Note  that  in  general. 


the  additive  noise  v^  is  redundant  whenever  q  >  p  ,  because  then  it  can  be 


absorbed  in  z^  by  a  proper  modification  of  the  parameter  {bj^}  . 

The  covariances  of  {y^}  can  be  computed  as  follows.  Let  us  introduce 
the  auxiliary  AR  process 


4 


Let  {Yj,}  denote  the  covariance  sequence  of  the  process  {w^}  .  The 
first  p+1  elements  of  this  sequence  can  be  obtained  by  solving  the  equations 
[51 

(A^+AjlEYp.  ....  Yi  .  7  =  [0  •••  0  1]\  (32) 

where 


—  — 1 

1  a,  ....  a 

a 

0  .  •  P 

.  .  . 

•  •  . 

•  ^9  * 

0  ‘  ‘  a. 

•  r_ 

C 

d^.«..d«  1 

P  1 

The  higher  order  elements  of  the  sequence  {y  }  can  be  computed  using  the 

n 

recursion 


V 

'  n 


n  >  p  . 


The  covariances  of  {y^}  are  related  to  those  of  {w^}  via 


(34) 


r 


n 


;  n=0 

;  otherwise 


(35) 


The  partial  derivatives  of  the  covariances  can  be  similarly  computed. 


Differentiating  (32)  with  respect  to  a,^  we  get 


-  +A  )r!l2.  !ll  1  !I0iT  -  f  i' 

"1  ^2^3a.  ••••*  3a.  ’  2  3a^J  '  LYp.k”**’^0 . 


3y  solving  these  equations  for  each  1  <  k  <  p  ,  we  get 

•tt" . TT^  ,  1  <  k  <  pi  (note  that  Ai+Ao  needs  to  be  inverted  only 

33^  ^  12 

once).  Differentiating  (34)  with  respect  to  a.  we  get 


Finally, 


!ln.  ?  ,  '^n-i 


.i,  i  3a,  ■  ^n-k 

1=1  k 


n  _  2 


q  q 


L  .1 

k  1=0  j=0 


^^In.i^jl 


3Yn  2  9 

sHT  '  *^u  .Jq  ^i^^|n-k+il  '*’^ln+k-i|^ 


3Yn  <1 

=  T  "  b.b.Y,  .  . 

^  i=0  j=0  ’  J 


3Yn  1^1 

33^  I  0  ;  ot.herwise 


Equations  (32),  (34),  (35),  (36),  (37),  (38)  provide  an  algorithm  for 
computing  the  covariances  and  their  partial  derivatives  for  the  ARMA  plus 
noise  model. 


4.  ASYfAPTOTIC  BEHAVIOR  OF  THE  ALGCRITHJ-l 

In  this  section  we  limit  ourselves  to  time  series  with  zero  meanss. 

Furthermore,  we  assume  that  {y^}  has  nonzero  innovation  variance  a“  •  3y 

Whittle's  formula  (7),  the  information  matrix  J  (el  is  asymptotically 

n 

proportional  to  n.  We  therefore  exoect  that  the  increment 

-  4  5”^f  1  f  appearing  in  the  update  formula  (20)  will 
n-in,k-^,i  2  ^n  n,k  n,i 

converge  to  a  constant  value  as  n  goes  to  infinity.  Indeed,  recall  that 

0^  =  lim  5  =  rQ  2  (1-c?)  .  (39) 

n-H»  i=l 


Hence , 


*  c?  <  -  ?  log(l-c^  =  log  i  <  -  ,  (40) 

1=1  ^  i=l  ’ 


Therefore, 


lim  *  c?  =  0  .  (41) 

n+c  i=n+l  ' 

It  is  easy  to  show  that  due  to  (40)  the  variables  5  ,  f  .  and 

n  n,k 

iJi  k  in  e  constant  values  as  n  +  «  .  Thus,  for  large  enough  n, 

J^(9)  becomes  approximately  linear  in  n.  This  means  that  for  some  oq, 

o’  (0)  «  (0)  +  (n-n  )  J(9)  ,  n  >  n.  ,  (42) 

n  Oq  u  0 

where  0'(8)  is  a  constant  matrix.  The  approximate  relationship  (42)  can  be 
used  to  terminate  the  information  updating  algorithm,  in  the  following 
manner.  Suppose  we  can  find  some  n^  such  that 


15 


where  e  is  determined  by  the  desired  degree  of  accuracy.  Then  we  can  stop 


the  algorithm  at  nsn^  ,  take  jle)  as  the  last  computed  increment  of 
j^{9)  ,  and  extrapolate  J^(a)  ^cr  all  n  >  Oq  using  (42).  The  problem  is,  of 
course,  to  determine  n^  so  as  to  guarantee  '43).  One  way  of  doing  this  is  to 
compute  a  moving  sum  of  squared  partial  correlations,  say 


where  n^  ,  is  fixed.  Then  we  can  choose  Hq  as  the  first  value  of  n  for 

which  K  <  e  .  If  the  sequence  rc  ^  is  sufficiently  regular,  this  criterion 
n  '  n-* 

is  a  reasonable  approximation  of  (43). 

For  ARMA  processes,  the  value  of  n^  can  be  determined  by  the  ARMA 
parameters,  and  there  is  no  need  to  actually  test  the  partial  conrrelations. 

In  fact,  a  consecutive  estimate  of  n^  is  provided  by  the  following  lemma. 


Lemma: 

Let  ab(z)/a(2)  be  the  transfer  function  of  a  (p,q)  ARMA  process,  and 
assume  that  all  the  roots  of  b(z),  1  <  k  <  q}  ,  are  inside  the  unit 

circle.  Let  n^  be  an  integer  such  that 

I  leJn,  <^e.  («) 


and  let 


n^  »  p  +  q(n^.l)  . 


(46) 
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'-fi't  <=>^(2)  be  the  n-th  order  Levinson-Szego  polynomial  of  the  given 

process  (i.e.,  the  polynoitiial  whose  coefficients  are  the  components  of 

.  As  is  well  known,  2^,(2)  minimizes  the  prediction  error  variance  amon 

all  polynomials  of  degree  n,  and  the  minimum  prediction  error  variance  is 

c  .  Therefore, 
n 


iit 


a„(e^'")ob(e^"‘) 


ale'^'") 


A(ej‘^)cb(ej“) 


a(e'^“*) 


a-i, 


(47) 


for  any  n-th  degree  polynomial  x(z)  .  Let  us  deinfe  the  following 
polynomial : 


,  ,  Q  1  n,-I  -(n.-l) 

Xq(z)  =  a(2)  .  a  (l+BuZ  +  ...  +  Su  z  ^  1. 

u  k»l  ^ 


(48) 


3y  (46),  the  degree  of  x^(z)  is  n^^  .  Hence  by  (47), 


^"0  ‘  c 


'O' 

aXQ(e^“)b(e'’'*')'^ 


a(eJ“) 


d.  .f-  / 


2  T 

_  r 

Zk 


Q  j  n. 

n  (l-(8ve  'J'-^)  ^ 
k«l 


d,^ 


<  sup 


n  (l-(Sve'j“ 
k»l 


_  2r 

•  ^  L 


d  "i  o 

n  (i+1bJ  . 

k*l 


(49) 


Hence,  by  (45), 


(50) 


Fi  r.ally , 

T  c?  <  Y  -  log(l-c?)  *  log  <  z  .  (51) 

1=nQ+l  '  i=nQ+l  ’  7^ 

In  summary,  any  n^  that  satisfies  (45),  (46)  can  be  used  as  a  termination 
point  for  the  algorithm.  Note  that  n^  is  essentially  determined  by  the 
zero(es)  using  the  largest  magnitude.  The  denominator  polynomial  has  little 
effect  on  n^  ,  except  when  p  >>  q  or  when  all  the  zeroes  of  b(z)  have  small 
magnitudes.  For  pure  AR  processes,  c.  *  0  for  all  i  >  p.  The  relationship 
(42)  then  holds  exactly  for  Oq  =  p+1  .  This  fact  was  also  proven  in  [6] 
using  different  arguments.  For  ARMA  processes  having  zeroes  near  the  unit 
circle,  partial  correlations  may  converge  to  zero  very  slowly.  Therefore,  for 
such  processes,  the  Fisher  information  matrix  reaches  its  asymptotic 
approximation  (7)  only  at  very  large  values  of  n. 


I 
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5.  NUMERICAL  EXAMPLES 


In  this  section  we  illustrate  the  behavior  of  the  Fisher  information 
matrix  of  ARMA  processes  by  some  examples.  Rather  than  considering  the 
information  matrix  itself,  we  consider  the  following  quantities: 

(i)  The  diagonal  elements  of  these  are  the  Cramer-Rao  bounds  on 

the  respective  components  of  the  parameter  vector  9  . 

(ii)  The  Cramer-Rao  bound  for  unbiased  estimates  of  the  logarithm  of  the 
spectral  density,  this  is  given  by  [7] 

CR3{log^(cu)}  =  d' (^)J*M3)D(cj),  (52) 


where 


1  rSjici))  3^(ui)l 


(53) 


Example  *1: 

In  this  example  we  consider  an  ARMA  processof  order  (2,2),  with  a  pair  of 
conjugate  poles  and  a  pair  of  conjugate  zeroes.  Both  the  poles  and  the  zeroes 
have  magnitudes  (0.95)^^^  .  and  the  phase  angles  are  *45*  for  the  poles  and 
±135®  for  the  zeroes.  The  ARMA  transfer  function  is 

b(z)  l+1.378z'^+0.95z’^  ,-.m 

■STTT  “  - n - IT*  ' 

1-1.378Z  ^+0.95z  ^ 

Figure  la  shows  the  CRB's  of  the  parameters  a^ ,  a^,  b^ ,  b^  as  a  functicn  of 
n.  The  CRB's  are  in  dB  and  the  n  axis  is  in  log  scale,  so  that  tne  asymptotic 


approxiriations  appear  as  straight  lines.  As  can  be  seen,  the  discrepancy 
between  the  exact  bounds  and  the  asymototic  approxinations  is  very  large  men 
the  number  of  data  points  is  small,  especially  for  the  numerator  paramete'^s. 
Only  at  about  n  ■  500  do  the  exact  bounds  converge  to  their  asymptotic 
approximations.  Figures  lb  and  Ic  show  the  exact  and  asymptotic  bounds  on  the 
log  spectnjm  (tl  standard  deviation)  for  50  data  points.  As  can  be  seen,  the 
behavior  of  the  bound  in  the  vicinity  of  the  pole  is  similar  in  both 
figures.  However,  its  behavior  in  the  vicinity  of  the  zero  is  considerably 
different:  the  asymptotic  approximation  is  far  too  optimistic. 

Example  =*2: 

This  example  is  similar  to  the  previous  one,  except  that  the  poles  were 
moved  to  phase  angles  of  *70*  and  the  zeroes  were  moved  to  phase  angles  of 
*110®.  The  corresponding  transfer  function  is 

b(2)  ,  1+0. 6672‘Uo. 952-2  .  . 

1-0.6672'^+0.952’^ 

Figure  2a  shows  the  bounds  on  the  parameters.  Figure  2b  shows  the  exact  bounds 
on  the  spectrum,  and  Figure  2c  depicts  the  approximate  bounds  on  the 
spectrum.  Note  the  difference  in  the  bound  of  compared  to  the  previous 
example. 

Example  »3: 

Here  we  moved  the  pole  and  the  zero  even  closer  to  each  other.  The  poles 


have  phase  angles  of  ±85"  and  the  zeroes  have  phase  angles  of  ±95®. 
corresponding  transfer  function  is 


'he 


(56) 


b(2) 

alzT 


l+0.17z"^-i-0.952 

l-0.17z‘^+0.95z 


-2 


The  bounds  are  shown  in  Figures  3a,  3b,  3c.  Note  the  dramatic  change  in 
the  bounds  of  a^^  and  compared  to  the  previous  examples,  for  small  values 
of  n. 


Example 

For  this  and  the  subsequent  examples  the  model  was  a  sum  of  two 
uncorrelated  narrowband  processes  and  white  noise.  Such  a  process  has  a 
spectral  density  function 


>(.)  =  I 


E^C(l-pJ)  -  {l-pJ)p,.cos2nf.{e^“+e'^"‘) 


i*l  l+4p^cos^2T:f^  +  -2p^cos2nf^{l+P?)(e'^''*’-"e"'^-")+p?(e“-“+e"^'^“‘) 


+  CJ 


(57) 


In  this  example  we  chose  *  ^2  *  *  0»2H2,  f^  *  0.225  Hz, 

p 

El  =  £2  »  1,  Oy  *  2  .  Thus,  the  SHR  is  -3d3  for  each  of  the  two  narrowband 
processes.  The  equivalent  ARMA  description  of  this  process  is 


cb(z)  _ 

Tzr 


1.5856(l-0.8706z'^^I.9194z“^-0.7610z"^^0.764l2~^) 

1-0.9217z’'^+2.15022"^-0.9036z’'^+0.9606z"'^ 


(58) 


U  • 


ja 


'v" 
•  • 

.  r*- 
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The  number  of  data  points  was  chosen  to  be  n»50.  Figures  4a  and  4b  show  the 
exact  and  the  approximate  bounds  on  the  spectrum.  Note  that  the  peaks  of  the 
lower  bound  are  lower  than  the  dip  of  the  upper  bounds.  This  means  that  with 
high  probability  the  two  narrowband  processes  cannot  be  resolved  by  any 
unbiased  estimator  of  the  spectrum.  This  phenomenon  cannot  be  predicted  by 
the  asymptotic  approximation,  but  only  by  the  exact  bound. 
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Example  =^5: 

This  example  is  similar  to  the  previous  one,  except  that  the  white  noise 
variance  was  increased  to  4  (i.e.,  the  SNR  is  -6  dB).  The  equivalent  ARMA 
description  of  this  process  is 


b(z)  ^  2. 1630(1-0. 8863z~^  -t-  1.9883z~^-0.8Q32z~^^0.8212z“'*) 
^  1-0.9217z"^+2.15022"^-0.9306z''^+0.9606z"'^ 


The  exact  and  approximate  spectral  Pounds  are  shown  in  Figures  5a  and  5b,  for 
n=50.  As  we  see,  the  peaks  of  the  lower  bounds  are  much  below  the  dip  of  the 
upper  bound,  is  that  the  two  processes  are  not  likely  to  be  resolved  at  all. 


Example  -6: 

This  example  is  similar  to  the  two  previous  ones,  except  that  “  8  , 
i.e.,  the  SNR  is  -9  dB.  The  equivalent  ARMA  description  of  this  process  is 


ab(z)  ,  2.9798(1-0.8979z'^-v2.Q4072*^-0.8353z“^^-0.8655z“'^) 
^_Q^g2i72“'^+2.15022"‘^-0.9036z''^+0.9606z‘* 


Figures  6a  and  6b  show  the  exact  and  the  asymptotic  spectral  bounds.  Here 
even  the  asymptotic  approximation  indicates  that  the  two  processes  cannot  be 
resolved.  However,  the  exact  bound  indicates  that  with  high  probability  none 
of  processes  can  be  detected. 


Example  47: 

This  example  is  similar  to  example  #4,  except  that  f^  was  changed  to 
0.2125  Hz.  The  equivalent  ARMA  description  of  this  process  is 

ab(z)  1. 5723(1-1. 0192z"^+2.0215z’2-0.3985z"^+0.777l2’'^)  , 

I  ']  '  j  •  ,  I.  II  ,  \  0 1 

l-1.0738z'  +2.2438z‘  -1 .05292"''+0.9605z’ 


Figures  7a  and  7b  show  the  exact  and  the  asymptotic  spectral  bounds.  The  two 
narrowband  processes  are  evidently  indistinguishable,  but  the  approximate 
bound  fails  to  indicate  this. 


Example  #8: 

This  example  is  similar  to  examples  #4  and  #7,  except  that  was 
changed  to  0.25  Hz.  The  equivalent  ARMA  description  of  this  process  is 


ab(z)  ,  1. 5943(1-0. 5763z“^>1.7395z^^>0.5011z“^-t-0.7558z~^) 
l-0.6119z"^+1.9603z‘^-0.5997z‘"‘+0.9606z”'^ 


Figures  3a  and  3b  show  the  exact  and  the  asymptotic  spectral  bounds.  Mow  the 
two  frequencies  are  sufficiently  for  apart,  so  that  the  two  bounds  are  similar 
and  both  indicate  that  the  two  can  be  easily  resolved. 


yi  UBS 


We  presented  an  algorithm  for  computing  the  exact  Fisher  information 
matrix  of  parametric  Gaussian  time  series  whose  random  components  are 
stationary.  The  algorithm  is  computationally  efficient,  requiring  a  nunoer  of 
operations  proportional  to  for  computing  the  matrices 

1  <  n  <  fi}  .  The  Cramer-Rao  bound  for  unbiased  estimates  of  the 
parameters  is  simply  the  inverse  of  the  information  matrix. 

The  algorithm  was  specialized  to  the  case  of  ARMA  processes  with  additive 

white  noise,  and  closed  form  expressions  were  derived  for  the  covariances  and 

their  partial  derivatives.  Some  common  nonstationary  time  series  can  be 
similarly  handled,  such  as  sums  of  sinusoids  in  white  or  colored  noise, 

rational  impulse  responses  in  white  or  colored  noise,  etc. 

Examination  of  the  exact  information  matrix  of  ARMA  processes  reveals  an 
interesting  fact.  As  is  well  known,  the  asymptotic  information  matrix  of  ARMA 
processes  is  symmetric  in  the  numerator  and  denominator  parameters.  In  other 
words,  interchanging  the  numerator  and  the  denominator  polynomials  leaves  the 
information  matrix  unchanged,  except  for  row  and  column  permutations 
[1,  pp.  240].  However,  the  exact  information  matrix  does  not  share  this 
symmetry  property.  See  for  example  the  difference  between  the  denominator  and 
the  numerator  parameters  in  Figure  la,  when  the  number  of  data  is  small.  This 
observation  offers  a  partial  explanation  to  the  well  known  fact  that  with  a 
small  number  of  data  points  it  is  much  more  difficult  to  accurately  estimate 
zeroes  than  poles. 

We  finally  note  that  the  Cramer-Rao  bound  for  short  data  records  is  not 
necessarily  tight,  i.e.,  efficient  ARMA  algorithms  may  not  exist.  However, 
the  CRB  still  provides  a  lower  bound  on  the  performance  of  any  given 
algorithm.  We  should  stress  that  the  bound  applies  to  unbiased  estimates 
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only.  While  there  is  no  evidence  that  there  exist  unbiased  AR*iA  estimation 
methods,,  most  existing  algorithms  are  designated  to  be  approximately 
unbiased.  For  such  algorithms,  the  inverse  of  the  information  matrix  offers  a 
reasonable  measure  of  achievable  performance. 
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APPENDIX  A:  PROOF  OF  THE  IMFCMRATION  MATRIX  FORMULA 


The  algorithm  of  the  joint  density  function  f(y)  given  in  (2)  is 
log  f(y)  =  -  ^  log  2,1  -  log  det  Rta)  -  |:y-m(5)l'^R-^(9)[v.m(9)l(A.l) 

Differentiation  with  respect  to  yields 
3l ogf (y) 

-3^^: - ^  tr{R-l(9)  l|Ml  +  ^y-m(0n^R"^(9)  ||lil  R'^ ( 9 )[y.m( a  1] 


5m( 8 )  T  1 

*  '  R  (3)Cy-i!i(3n  • 


(A. 2) 


Multiplying  3logf(^)/39j^  by  3logf(y)/39^  and  taking  expected  value  yields 

X  i  'k  i 

+  :j£{Cj^.-m(9)l''’R’^(9)Cy-m(9)l~iiR'^{9)[x>m(9)]^R"'{8)-iy^  R"^(9)[y-m(9)]} 


(A. 3) 


To  evaluate  the  last  term  let  us  denote 


x=y-m(9)  •,  A  =  R"^{9)  iMil  R'Me)  ;  3  =  R’^(9)  iilll  r'^i 


(A. 4) 


T,  T 


E(x  Ax  X  9  xl  =  Ef  y  x.A^^x.x  3  x  I  =  V  A  9  F!  t  x  *  y  l 
- -  ^-J.m.n  ^U^jVmnn/  Wl  ^i  ^j  Vn) 

i  j  *^mn  *  *^im''jn  '^in'^jm^ 

‘'ij'^ijJ*^  ^  ®mn  ’^-in^  ■"  i  ''ii^-'n®nn'^T,i  *  I  A..R.  3  R  . 
=  tr{AR}  .  tr{3R}  +  2tr{AR3R} 


.  tr!R-l(3)  11^  .  tr(l!-‘(9)|SIii  [  ♦  nr.R-La)  pill  . 


APPENDIX  B:  SUMMARY  OF  THE  ALGCRITW 


Inputs : 


N:  number  of  data  points 


{■^Q*  •  •  •  :  covariances  of  the  given  process 


{  . . s. 


-1  k  ’  ^  ^  ■  Partial  derivatives  of  the  covariances 


''*^0,k  ’ '  ’  ‘ k  ’  ^  ^  ^  Partial  derivatives  of  the  mean  vector 


Initial ization; 


in  =  1;  a  =  0 

j  n 


,  n=l,,...M-l; 


5  = 


"'O.k  ^  ^0,k 


.  k=l,...,M; 


’0,k  *  ^0,k^'"o 

1  1  /  2 

■^Uka  7  ^0,k^0,i/'‘0 


,  k=l,...,M; 


Do  for  n=l , . , . ,M-1 : 


=  =  I  I  0.  r  ]/5  . 
i=0  ’  ^  ^ 

n-1 

a,  =  I  a .  S 
<  ^-0  1  n-1  ,k 


,  k=l,...,M; 


6  =  5*(1-C  ) 


^-AlfiS  fi49  PARAMETRIC  TECHNIOUES  FOR  MULTICHANNEL  SIGNAL  53 

PROCESSINGIU)  SVSTEMS  CONTROL  TECHNOLOGV  INC  PALO  ALTO 
CA  B  FRIEDLANOER  OCT  85  5498-87  AR0-197a7  19-EL 
UNCLASSIFIED  DAAG29-8T-C-a827  F/G  12/1  NL' 


a^.  »  t. 


■  ^Iq  '’i.kVi 


,l<  "  '"'^n-i  ,k 

'^i.k  * 


*  k* 1  ^  » f  A  \ 


,  1 *0 ,  • • • f  n  j 


.  i=0,..,n; 


k=l,....M 


*  ^i.k  ■  “^n-i,k  ^  ° . "  ’ 


^i.k  *  ^ 


»  ^  ~0 » •  •  •  1  » 


k  =  1,...,M 


f,  =  'i  a.n.  , 
k  1  i.k 


,  k=l....,M  ; 


S|<’^Vf“i,k  . ”  ' 

n  .  , 

-H+l.k.i  =  %.k.z  ^  "  7 


..M  ; 


Comments: 


The  vector  a  ,  Cj^  and  the  scalar  5  are  overwritten  at  each  step 

by  the  new  values.  This  helps  keeps  storage  requirements 
proportional  to  N,  rather  than  N^.  The  temporary  storage  vector 
_t  is  used  in  updating  the  vectors  i*  • 

The  Fisher  information  matrix  is  not  overwritten  at  each  step; 
however,  the  algorithm  can  be  easily  modified  by  letting  the  new 
value  of ^overwrite  the  old  value. 


FIGURE  CAPTIONS 


Figure  1: 


Figure  2: 


Figure  3: 


Figure  4: 


Figure  5: 


Figure  6: 


Figure  7: 


Figure  8: 


Example  1—  a)  The  asymptotic  and  exact  Cramer-Rao  bounds  for 
the  ARMA  parameter  estimates,  b)  the  exact  Cramer-Rao  bound  for 
the  spectrum,  c)  the  asymptotic  Cramer-Rao  bound  for  the  log 
spectrum 

Example  2 —  a)  The  asymptotic  and  exact  Cramer-Rao  bounds  for 
the  ARMA  parameter  estimates,  b)  the  exact  Cramer-Rao  bound  for 
the  spectrum,  c)  the  asymptotic  Cramer-Rao  bound  for  the  log 
spectrum 

Example  3 —  a)  The  asymptotic  and  exact  Cramer-Rao  bounds  for 
the  ARMA  parameter  estimates,  b)  the  exact  Cramer-Rao  bound  for 
the  spectrum,  c)  the  asymptotic  Cramer-Rao  bound  for  the  log 
spectrum 

Example  4—  a)  the  exact  Cramer  Rao  bound  for  the  log  spectrum, 
b)  the  asymptotic  Cramer-Rao  bound  for  the  log  spectrum 

Example  5—  a)  the  exact  Cramer  Rao  bound  for  the  log  spectrum, 
b)  the  asymptotic  Cramer-Rao  bound  for  the  log  spectrum 

Example  6 —  a)  the  exact  Cramer  Rao  bound  for  the  log  spectrum, 
b)  the  asymptotic  Cramer-Rao  bound  for  the  log  spectrum 

Example  7 —  a)  the  exact  Cramer  Rao  bound  for  the  log  spectrum, 
b)  the  asymptotic  Cramer-Rao  bound  for  the  log  spectrum 

Example  8 —  a)  the  exact  Cramer  Rao  bound  for  the  log  spectrum, 
b)  the  asymptotic  Cramer-Rao  bound  for  the  log  spectrum 
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ABSTRACT 

This  paper  proposes  an  algorithm  for  estimating  the  power  spectra  of 
multichannel  w1 de-sense  stationary  processes.  The  processes  are  modeled  as 
the  output  of  a  multivariable  linear  system  driven  by  white  noise.  The 
transfer  function  of  the  system  Is  given  by  a  numerator  matrix  of  polynomials 
divided  by  a  scalar  denominator  polynomial.  The  denominator  polynomial  Is 
estimated  first,  using  the  overdetermined,  order  over-estimated,  modified 
Yule-Walker  method.  Modal  decomposition  Is  used  to  eliminate  superfluous 
modes  to  reduce  the  order  of  the  transfer  function.  Finally,  the  numerator  of 
the  spectral  density  matrix  Is  estimated. 


This  worK  was  supported  by  the  Army  Research  Office  under  contract  no.  0AAG29 
83-C-0027. 


1.  INTRODUCTION 


Parametric  models  are  widely  used  in  the  statistical  analysis  of  scalar 
time  series.  In  particular,  autoregressive  (AR)  and  autoregressive  moving- 
averge  (ARMA)  modeling  has  proven  to  be  very  successful  in  many  applications 
[1],C2].  Many  problems  of  practical  interest  involve  vector  processes.  As 
examples  we  mention  signals  in  an  acoustic  and  seismic  arrays.  It  often 
happens  in  such  applications  that  important  information  is  present  in  the 
cospectra  of  the  various  channels  (rather  than  in  the  autospectra).  In  such 
applications  it  is  necessary  to  perform  multichannel  processing  in  order  to 
extract  the  desired  information. 

Traditional  multichannel  time  series  analysis  is  based  on  the  use  of 
periodograms  and  windowed  periodograms  [3].  Multichannel  maximum  entropy 
spectral  analysis  has  also  gained  some  popularity  in  recent  years  [4]. 
Parametric  modeling  for  multichannel  time  series  was  discussed  by  several 
authors  [3]  ,[5], [61.  Usually,  the  multichannel  ARMA  model,  which  is  a  special 
case  of  left  matrix  fraction  description  [7],  is  used  in  these  discussions. 

The  main  problem  in  using  parametric  models  for  multichannel  time  series 
is  their  high  dimensionality.  The  number  of  free  parameters  is  generally 
proportional  to  the  square  of  the  number  of  channels.  Note  that  even  a 
relatively  simple  problem  involving  a  two-channel  ARMA  model  of  order  (2,2) 
has  20  free  parameters.  Simultaneous  estimation  of  so  many  parameters  using  a 
maximum  likelihood  method  is  difficult.  Problems  such  as  obtaining  initial 
conditions,  searching  among  multiple  local  minima  and  selecting  the 
appropriate  order  are  extremely  difficult  to  handle. 

In  this  paper  we  propose  a  parametric  spectral  estimation  algorithm  which 
is  aimed  at  circumventing  some  of  the  practical  difficulties  encountered  in 
maximum  likelihood  estimation.  The  algorithm  uses  the  sample  covariances 


(rather  than  the  data  directly),  and  Is  an  extension  of  the  scalar  modified 
Yule-Walker  (MYW)  method  with  modal  decomposition,  reported  in  [8].  The 
proposed  technique  is  non-iterative  and  for  the  most  part  requires  the 
solution  of  sets  of  linear  equations.  The  parameters  of  the  denominator  and 
numerator  of  the  spectral  density  matrix  are  estimated  in  two  separate 
steps.  This  alleviates  somewhat  the  problem  of  high  dimensionality. 

While  the  proposed  estimation  procedure  is  not  asymptotically  efficient, 
it  appears  to  be  more  robust  and  considerably  less  complex  (in  terms  of 
computational  requirements)  than  the  maximum  likelihood  estimator.  The  MYW 
based  approach  seems,  therefore,  better  suited  for  practical  spectral  analysis 
problems  than  the  maximum  likelihood  approach.  The  facts  that  initial 
conditions  are  not  required  and  that  the  computations  consist  largely  of 
linear  least-squares  fits,  makes  the  proposed  approach  especially  attractive. 

The  outline  of  the  paper  is  as  follows.  In  section  2  we  present  the 
model  to  be  used,  and  introduce  some  basic  notations.  In  section  3  we  give  a 
detailed  description  of  the  algorithm.  In  section  4  we  illustrate  the 
performance  of  the  algorithm  by  some  simulation  examples.  Section  5  discusses 
the  main  advantages  and  drawbacks  of  the  proposed  technique,  and  suggests  some 
possible  modifications. 
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2.  THE  MODEL 


l-et  {y^}  be  a  p-d1mensional  zero-mean  wide-sense  stationary  process.  We 
assume  that  {y^}  is  related  to  some  p-dimesnional  white  noise  process 
{w^}  by  a  rational  pxp  transfer  matrix  H(z),  i.e. 

y(z)  =  H(2)w(2) ,  (1) 

where  y(2)  and  w(2)  are  the  formal  z- transforms  of  {y^}  and  {w^}  .  The 
transfer  matrix  H(z)  is  assumed  to  be  stable  and  causal.  The  covariance 
matrix  of  w^  can  be  assumed,  without  loss  of  generality,  to  be  the  identify 
matrix  (since  this  covariance  matrix  can  always  be  absorbed  in  H(2). 

The  model  (1)  includes  many  common  parametric  models  as  special  cases. 

For  example,  the  ARM  A  model 

y(z)  »  Aj|M2)B^(z)w(z)  ,  (2) 

is  clearly  of  the  form  (1).  In  this  case  H(z)  is  written  in  the  form  of  a 
left  matrix  fraction  description  (MFD).  The  AR  plus  noise  model 


x(z)  =  A^l{z)u(z) 

(3a) 

y(z)  =  x(2)  +  v(z)  , 

(3b) 

where  {u^}  and  {v^}  are  uncorrelated  white  noise  sequences,  can  also  be 
transformed  to  the  form  (1).  In  some  applications,  the  natural  description  of 
the  process  is  in  terms  of  a  right  MFD, 


(4) 


y(z)  »  B^(2)A^^(z)  w(z)  , 

see  e.g.  [9]  for  such  an  application. 

The  main  difficulty  In  using  the  models  (2),  (3)  or  (4)  for  spectral 
estimation  lies  In  the  fact  that  the  denominators  of  these  models  are 
polynomial  matrices.  Therefore,  the  modes  of  the  spectrum  do  not  appear 
explicitly,  but  are  "hidden"  In  the  determinant  of  the  corresponding  matrix 

polynomial  (A^(z)  or  A^(z))  •  An  alternative  model,  which  makes  It  easier  to 

display  the  spectral  modes.  Is  given  by 

H(2)  ,  (5) 

a(z"M 

where  a(z”M  is  the  least  common  multiple  of  the  denominators  of  the  entries 
of  H(z),  expressed  In  powers  of  ♦  The  matrix  polynomial  B(z”^)  is  also 
expressed  In  powers  of  z"^  .  In  general,  a{z”^)  and  B(z‘M  of  the  same 
degree, 

a(z"M  =  1  +  +  ...  +  a^z""  (6a) 

B{z"M  =  Bq  +  Bj^z"^  +  ...  +  B^z'"  .  (6b) 

We  note  that  while  the  model  (5)  Is  quite  general.  It  Is  usually 
overparametrized.  For  example,  let  us  compare  the  number  of  parameters  in  (5) 
to  the  number  of  parameters  In  the  ARMA  model  (2).  A  p-dimenslonal  ARMA(m,m) 
model  has  (2nH-l)p^  free  parameters.  The  corresponding  characteristic 
polynomial  has  degree  n«mp,  so  that  In  (5)  we  have  mp+(mp+l)p^ 
parameters.  For  p  »  2  we  have  8m+4  and  10m4  parameters,  respectively.  Thus, 


5 


for  two-channel  time  series,  the  model  (5)  is  only  slightly  overparametrized. 

Let  S(z)  be  the  spectral  density  matrix  of  the  process  {y^}  .  This 
matrix  is  given  by 


S(z)  * - 4 - B(z"Mb^{z)  * - L - M{z)  , 

a(z"^)a(z)  a(z"^)a(z) 


(7) 


where 


N(z)  =  B(z"Mb^(z)  =  N  z'^+  ...  +  N.z  +  N  +  N,z‘^+  ...  +  N  z"”  . 

-n  1  0  1  n 

Mext  we  write  S(z)  in  terms  of  the  covariance  sequence  {R^}  , 

R,-  =  E{y^y^_^}  =*  ,  —  <  i  <  »  . 


Let 


(8) 


(9) 


=|l!g  »  j  R,z-' 
i“l 


(10) 


be  the  causal  part  of  the  spectrum.  Clearly, 


S(z)  »  S^(z"M  +  sj{z) 


(11) 


The  causal  part  can  be  expressed  as 


S.(z)  »  _i_C(z*M  , 
a(z'M 


(12) 


.V 


1 


iil; 


s' 
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From  (7), (11)  and  (12)  we  see  that  N(2),  C(z“M  and  a(z"M  are  related  by 

M(z)  *  C(z“Ma(z)  +  C^(z)a(z"M  .  (14) 

As  we  will  see  in  the  next  section,  the  parameters  of  the  matrix 
C(z”M  can  be  estimated  by  a  relatively  simple  procedure. 


3.  THE  ALGORITlfi 


The  proposed  algorithm  Is  based  on  estimating  the  coefficients  of  the 


rational  spectral  density  matrix  from  the  sample  covariances.  These  are 


computed  from  the  measured  data  by 


Ri  =  K^T.I)  I  y^y;_^ 
'  t-1+1  ^  ^  ’ 


where  K{T,1)  Is  either  1/T  or  1/(T-1)  {the  biased  or  unbiased  covariances  ). 


Each  sample  covariance  R.  Is  a  pxp  matrix.  We  also  define  for  each  1  a 


p^-d1mens1onal  column  vector  obtained  by  stacking  the  columns  of  1?^ 


1  .e. , 


Pj  *  CR^  ( 1 « 1 ) « . . .  »R j  ( P »1 )  * . . .  »R^  ( 1  »p) » . . .  jR ^  { P t P )1 


Mote  that  any  estimation  algorithm  based  on  sample  covariances  will  not  be 


efficient  in  the  statistical  sense,  i.e.,  It  will  not  achieve  the  Cramer-Rao 


lower  bound,  even  asymptotically*  [10].  However,  by  Increasing  the  number  of 


sample  covariances  used  In  the  algorithm  the  loss  of  efficiency  can  be  made 


quite  small  [11].  Furthermore,  spectral  estimation  algorithms  based  on  sample 


covariances  are  known  to  be  more  robust  than  algorithms  of  the  maximum 


likelihood  type  (I.e.,  they  are  less  sensitive  to  Initial  conditions,  model 


inaccuracy,  or  the  choice  of  the  nuntoer  of  parameters). 


The  algorithm  consists  of  three  steps.  In  the  first  step,  an  Initial 


estimate  of  the  characteristic  polynomial  Is  obtained  by  a  multichannel 


version  of  the  modified  Yule- Walker  equations.  This  Initial  estimate  has  a 


special  case  of  pure  autoregressive  processes 


ij 

d 

i| 

u 

! 

high  degree  in  general,  comoared  to  the  degree  of  the  true  characteristic  i 

polynomial.  Thus,  at  the  second  step,  the  degree  of  the  initial  estimate  is  j 

reduced  to  yield  the  final  estimate  of  the  characteristic  polynomial.  This  is  \ 

I 

done  using  modal  decomposition  of  the  causal  part  of  the  spectrum  and  an  \ 

appropriate  elimination  process.  Finally,  in  the  third  step,  the  numerator 
matrices  are  estimated  by  a  least  squares  technique,  using  the  estimates  of 
the  characteristic  polynomial  from  the  second  step. 

The  three  steps  of  the  algorithm  will  now  be  described  in  greater  detail. 


3.1  INITIAL  ESTIMATION  OF  THE  CHARACTERISTIC  POLYNCMIAL 

The  covariances  of  the  process  {R^}  can  be  easily  shown  to  satisfy  the 
Yule-Walker  type  equations 

Ji  •  -*1  >  '  > 

Substituting  the  sample  covariances  for  the  true  covariances  in  (17)  we  get 
the  so-called  modified  Yule- Walker  equations 

. 

*k’^i-k  *  ■’^i  5  1  >  n+1  ,  (18) 


or  equivalently. 


(19) 


It  was  demonstrated  experimentally  in  [12],  and  proven  mathematically  in  [13], 
that  by  taking  an  overdetermined  set  of  equations  of  the  form  (19)  and  solving 
them  in  the  least-squares  sense,  the  statistical  efficiency  of  the  estimated 


characteristic  polynomial  coefficients  can  be  improved,  compared  to  the  case 
where  only  the  minimal  number  of  equations  is  used.  Thus,  in  practice  we 
solve  the  following  set  of  equations  in  the  least- squares  sense: 
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(20) 


The  number  of  equations  in  (20)  is  p^n^  ,  and  the  number  of  unknowns  is 
r\2  .  It  was  shown  in  [8],  [14]  that,  if  n2  is  taken  as  the  true  degree  of 
the  process  characteristic  polynomial,  the  estimates  {a^^,  ...,  a^^}  may  be 
considerably  biased  in  some  cases.  This  can  be  intuitively  explained  as 
follows:  Equation  (20)  has  the  form  of  a  least-squares  autoregressive  fit  of 
the  "data"  {p^,  p^,  ....  }  .  It  is  well  known  that  the  estimates 
{aj^,  ....  a^}  are  unbiased  only  when  the  error  between  the  two  sides  of  (20) 
is  a  white  noise.  However,  for  general  rational  models,  the  sequence 
{p^.  P2»  •••  }  follow  an  exact  autoregression,  and  the  error  will 

not  be  white.  By  taking  a  sufficiently  large  order  n^  in  (20),  we  can 
approximately  "whiten"  the  error  sequence.  Based  on  this  intuitive  argument, 
the  use  of  Akaike's  information  criterion  was  advocated  in  [8]  to  determine 
the  value  of  n^  .  Here  we  follow  the  same  choice,  but  mention  that  other 
choices  have  been  proposed,  e.g.  [15]  ,[16]. 

The  number  of  equations  in  (20)  is  usually  selected  by  some  ad-hoc 


procedure.  However,  Is  often  necessary  to  guarantee  a  reasonable 

statistical  efficiency,  see  e.g.  [121  ,[13].  We  have  adopted  a  constant  ratio 
between  p^n^  and  ^or  convenience.  Thus,  equation  (20)  is  solved  for 
different  values  of  ,  where  p^n^^  is  always  taken  to  be  fixed  multiple  of 
02  .  For  each  solution,  the  Akaike  information  criterion  [17]  is  computed, 
and  the  final  choice  of  02  is  made  by  minimizing  this  criterion. 

3.2  ORDER  REDUCTION  BY  MODAL  DECCMPOSITION 

Let  us  denote  the  02* th  order  polynomial  obtained  from  the  modified 
Yule- Walker  equations  by  a(2"M  .  As  explained  before,  the  degree  of 
a(2"M  is  usually  much  larger  than  the  true  degree  of  the  characteristic 
polynomial.  Furthermore,  it  was  shown  in  [8]  that  a  final  estimate  of  the 
characteristic  polynomial  can  be  chosen  to  be  a  divisor  of  a(z"M  .  This 
divisor  is  obtained  by  the  following  process  of  decomposition  and  elimination. 
Let  us  factor  alz'M  into  its  first-  and  second-order  real  factors: 

n  p  n^ 

a(z'M  =  (  n  d.(z"M)(  n  .  (21) 

i-1  ’  i»l  ’ 

where  n^  is  the  number  of  real  roots  and  n^  is  the  number  of  complex  pairs 
of  roots,  so  that  n2  *  n^  +  2n^  .  The  polynomials  {d^(z"M}  are  of  degree 
1,  and  the  polynomials  {e^{z"M}  are  of  degree  2,  i.e., 

d^(z"M  *  1  +  d^  ,  (22a) 

e^(z"M  ■  1  +  e^^  ^z"^  +  e2  ^z"^  .  (22b) 

Since  aiz'M  not  guaranteed  to  be  stable,  it  is  necessary  to  replace  it  by 


a  stable  spectral  factor  of  a{2"^)a(z)  .  This  is  done  by  reflecting  the 
unstable  roots  of  a{z)  inside  the  unit  circle,  as  follows:  whenever 


|d^  j^|>  1  we  redefine  d^(z)  as 


d^(z"M 


(23a) 


Similarly,  whenever  |e2  ^1  >  1,  we  redefine  e^(z)  as 


e^(z"M 


.  fill  z-1. 

®2.i 


-2 


•2,1 


(23b) 


Assuming  that  all  the  roots  of  a(2"M  are  distinct,  we  can  use  (10)  to  make 
the  following  approximation. 


j  ^  '  0,  .z”^  ”c  E,  ^2"^+  Eo 

y  R  I — — r.*  y 


— ?}*n(z) 


(24) 


where  n(z)  the  z- transform  of  a  rectangular  window  on  the  interval  [1,03], 
and  *  denotes  a  complex  convolution.  The  number  of  covariances  03  ^5  chosen 
so  that  '’3  '’2  * 

The  expansion  (24)  will  be  used  to  select  modes  that  will  appear  in  the 
final  estimated  characteristic  polynomial,  and  to  eliminate  undesired  modes. 

To  do  this  we  estimate  {0  }  and  {£,  E-  i}  by  performing  the  following 

least-squares  fit  in  (24):  let  us  denote 
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(25a) 
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r  ■ 

(25b) 


Then  we  can  express  (24)  in  the  time  domain  as 


where 
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(26c) 
(do  by  hand) 


(26d) 


The  vectors  j  are  obtained  by  stacking  the  columns  of  Oj  j.  the  vectors 
®1J  stacking  the  columns  of  j  .  and  the  vectors  ^  by  stacking  the 


(cf.  equation  (16)).  Equation  (26a)  Is  now  solved  In  the 


columns  of  E2  j 
least-squares  sense  and  the  solutions  are  “unstacked"  to  yield 
{Di^j}  and  {E^^j.  E2J}  . 

The  next  step  Is  to  compute  the  energies  of  the  Individual  modes  In  the 
various  channels.  It  Is  not  difficult  to  show  that  these  energies  are  given 
by  the  following  formulas. 


Energy  of  j-th  real  mode  In  channel  4i 


(27a) 


Energy  of  j-th  complex  mode  In  channel  4i 


(27b) 


Typically,  the  true  modes  (I.e..  those  present  In  the  actual  spectrum)  will 

tend  to  have  relatively  high  energies,  while  spurious  modes  will  have 

relatively  low  energies.  Me  therefore  arrange  the  (n  +n  )p  energies  In  order 

c 

of  decreasing  magnitudes,  and  associate  each  energy  with  Its  "parent  mode". 

The  mode  selection  process  can  now  be  done,  using  either  of  the  two  following 
criteria: 


(1)  Energy  threshold  criterion. 

In  this  case  all  modes  whose  energies  are  above  a  certain  threshold 
are  retained,  and  the  other  modes  are  discarded.  It  Is  convenient  to 
measure  all  the  energies  In  dB  relative  to  the  highest  energy,  and 
then  a  reasonable  threshold  would  be,  e.g.,  -50  dB. 


(1i)  Order  criterion. 


In  this  case  a  pre-selected  number  of  modes  (corresponding  to  the 
highest  energies)  is  retained,  and  the  rest  are  deleted.  This  is 
convenient  in  cases  where  the  true  order  of  a[z~^)  is  known  a 
priori . 

Finally,  all  modes  chosen  to  be  retained  are  multiplied  out  to  form  the  final 
estimated  characteristic  polynommial  a{z"^)  .  Clearly,  a(2"M  is  a  divisor 
of  a{2"M*  Also,  the  mode  selection  procedure  described  above  guarantees 
that  these  modes  capture  most  of  the  signal  energy,  in  the  sense  of  the 
approximation  (24). 

3.3  ESTIMATION  OF  THE  NUMERATOR 

Numerator  estimation  is  based  on  the  additive  decomposition  (11), (12). 
Similarly  to  (24),  we  can  approximate  by 

”4 

^R*0  +  I  Ri-z’^  -  (,J^C(z"M}  *  n(z)  .  (28) 

i*l  a{z  ) 

where  n(z)  is  now  the  z- transform  of  a  rectangular  window  on  the  interval 
[0.  The  number  n4  can  be  taken  to  be  much  smaller  than  03,  because  the 

A  1  '*1 

order  of  a(z  )  is  usually  much  smaller  than  that  of  a(z  )  •  Let  us 
denote, 


t"'- 


1 


•v 

£ 


Then  we  can  rewrite  (28)  as 


(30) 


where  the  vectors  obtained  by  stacking  the  columns  of  (5^  . 

Equation  (30)  is  solved  in  the  least-squres  sense,  and  the  solutions  are 
"unstacked"  to  form  {C.}  .  Finally,  the  numerator  of  the  spectral  density 
matrix  is  computed  by 

M(z)  =  C(2"Ma(2)  +  C^(2)a(2“^).  (31) 

The  estimated  numerator  n(z)  and  denominator  a(2'M  can  be  inserted  into 
equation  (7)  to  provide  the  desired  spectral  estimate. 
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4.  SIMULATION  EXAMPLES 


The  algorithm  described  in  the  previous  section  was  programmed  and  tested 
for  various  types  of  two-channel  processes.  Here  we  illustrate  the 
performance  of  the  algorithm  by  three  examples.  Ir  all  examples  the  number  of 
data  points  was  T  *  1024,  and  nl  =  20,  n2  =16,  n3  =  16,  n4  =  8. 

Example  #1: 

Here  we  generated  the  data  by  a  right  MFD  ,  where 


i 


(32a) 


(32b) 


t 


Figure  la  shows  the  autospectra  S^{w)  and  ,  and  the  co-spectrum 

S2j(a))  (magnitude  and  phase)  of  this  model.  Figure  lb  shows  the 
corresponding  estimates  obtained  by  the  algorithm.  As  we  see,  the  estimates 
match  fairly  closely  the  true  spectra,  except  at  frequencies  where  the  energy 
density  is  very  low.  This  is  not  surprising,  since  any  estimation  based  on 
least-squares  fit  would  give  little  weight  to  low  energy  regions. 


Example  i»2: 


Here  we  generated  data  according  to  the  model 
y^(t)  =  /Z  sin  Enf^^t  +  ^  sin  2itf2(t-D)  +  n^lt)  ,  {33a) 

y^lt)  =  /?  sin  2nf2^  *  "2^^^  ’ 

where  nj^(t),  n2(t)  are  uncorrelated  zero-mean  white-noise  sequences,  with 
unit  variance.  The  frequencies  f^,  f2  0*12  Hz  and  0.18  Hz 
respectively.  The  time-delay  0  is  2  seconds. 

The  estimated  autospectra  and  co-spectrum  are  shown  in  Figure  2.  Note 
that  both  sinusoids  are  well  represented  in  S^^(u))  .  Also  note  the  slight 
"leakage"  of  the  first  sinusoid  into  the  second  channel.  In  the  next  section 
we  comment  on  how  such  "leakage"  can  be  avoided. 

Example  #3: 

Here  we  generated  the  data  by  the  left  MFD  A^^(z)Bj^(z)  ,  where 

(34a) 


(34b) 


1  +  0.2z‘^  -0.3z’^  -  O.lz 

0.4  +  O.lz"^  1  +  0.3z'^ 


Al(z)  = 


-1  -2 
1  -  z  ^  +  0.8z 


-2 


-0.02z'^  +  O.Olz'^  1-1. 2z‘^  +  z"^ 
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The  characterisltc  polynomial  in  this  example  has  two  narrowband  modes  with 
relatively  closed  frequencies,  as  shown  in  the  ture  spectra  in  Figure  3a. 

Also,  both  channels  have  relatively  low  energies  at  the  high  frequency  band. 

As  we  see  in  Figure  3b,  the  estimates  are  fairly  accurate  in  the  low  frequency 
band,  but  quite  inaccurate  in  the  high  frequency  band.  Evidently,  the 
algorithm  has  problems  in  adequately  representing  the  frequencies  where  the 
energy  density  is  low. 

Numerous  other  tests  not  shown  here  indicated  a  similar  behavior:  good 
accuracy  at  high  energy  regions,  poor  accuracy  at  low  energy  regions,  and  some 
"leakage"  of  energy  from  one  channel  to  the  other. 


5.  DISCUSSION 


Vte  presented  a  spectral  estimation  algorithm  for  multichannel  stationary 
time  series  with  rational  spectra.  The  proposed  algorithm  is  non-iterative 
and  requires  mostly  the  solution  of  linear  sets  of  equations,  except  for  the 
factorization  of  the  estimated  characteristic  polynomial.  The  algorithm  is 
fairly  robust  in  the  sense  that 

(i)  No  initial  conditions  are  necessary; 

(iil  Various  types  of  rational  models  can  be  handled  by  the  algorithm; 

(iii)  The  model  order  need  not  be  known  a  priori,  but  is  estimated  by 

the  algorithm. 

The  main  disadvantages  of  the  algorithm  appear  to  be  as  follows: 

(i)  The  algorithm  is  not  efficient  in  the  statistical  sense; 

(ii)  The  accuracy  of  the  estimates  in  frequencies  of  low  energy 
densities  is  poor; 

(iii)  Some  inter-channel  "leakage"  is  apparent; 

(iv)  Positive  definiteness  of  S{z)  on  the  unit  circle  is  not 
guaranteed. 

Point  (i)  is  inherent  to  any  algorithm  based  on  the  sample  covariances.  Poin 
(ii)  is  also  typical  to  many  algorithms  based  on  sample  covariances, 
especially  those  which  are  based  on  some  least-squares  fit.  Point  (iii)  can 
be  largely  solved  by  the  following  modification  of  the  algorithm:  instead  of 
performing  mode  selection  using  the  diagonal  elements  of  S^(z)  ,  we  can 
perform  individual  mode  selections  for  the  p  elements  of  this  matrix  to 
obtain  p2  different  denominators.  Instead  of  one  common  denominator.  The 
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various  thresholds  used  in  this  selection  procedure  can  be  adjusted  so  as  to 
eliminate  any  undesired  leakage.  The  improved  version  of  the  algorithm  is 
currently  under  investigation,  and  results  will  be  reported  later. 

As  to  point  (iv)  the  only  way  to  guarantee  positive  definiteness  of  the 
spectrum  appears  to  be  by  direct  estimation  of  the  spectral  factor  b(z"M  • 
This  leads,  however,  to  a  nonlinear  problem,  which  requires  some  iterative 
techniques  for  its  solution  -  see  e.g.  [18],  [19].  Procedures  for  estimating 
the  spectral  factor  B{z"M  appear  to  be  inherently  more  complex  than 
techniques  for  estimating  B(z“^lB^{z)  • 
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Figure  la  Example  #1:  right  MFD  -  true  spectrum 


Figure  lb:  Example  #1:  right  MFD  -  estimated  spectrum 


Figure  2:  Example  #2:  sinusoids  in  noise  -  estimated  spectrum 


Figure  3a:  Example  #3:  left  MFD  -  true  spectrum 

Figure  3b:  Example  #3:  left  MFD  -  estimated  spectrum 
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APPENDIX  F 

OPTIMAL  INSTRUMENTAL  VARIABLE  MULTISTEP  ALGRITHMS  FOR 
ESTIMATION  OF  THE  AR  PARAMETERS  OF  AN  ARMA  PROCESS 
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OPTIMAL  INSTRUMENTAL  VARIABLE  MULTISTEP  ALGORITHMS  FOR 
ESTIMATION  OF  THE  AR  PARAMETERS  OF  AN  ARMA  PROCESS 

P.  Stoica,  B.  Friedl adder  and  T.  Soderstrom 

■  ABSTRACT 

Multi  step  inplenentations  are  derived  for  the  optimal  instrumental 
variable  (OIV)  estimators  introduced  in  [1].  The  proposed  algorithms  provide 
asymptotically  efficient  estimates  of  the  AR  parameters  of  an  ARMA  process. 
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1.  INTRODUCTION 


The  need  for  estimating  the  parameters  of  an  autoregressive  moving- 
average  (ARMA)  process  arises  in  many  applications  in  the  areas  of  signal 
processing,  spectral  analysis,  estimation  and  system  identification.  A 
computationally  attractive  estimation  procedure,  which  has  received 
considerable  attention  in  the  literature,  is  based  on  a  two-step  approach: 
first  the  autoregressive  (AR)  parameters  are  estimated  using  the  modified 
Yule-Walker  (MYW)  equations;  then  the  moving  average  (MA)  parameters  are 
estimated  by  one  of  several  available  techniques  C2]-C7]. 

In  this  paper  we  consider  only  the  first  step  of  estimating  the 
autoregressive  parameters.  In  many  engineering  applications  the  second 
estimation  step  is  not  needed.  The  prime  example  is  the  estimation  of 
autoregressive  signals  corrupted  by  white  measurement  noise.  In  this  case  all 
the  information  about  the  spectral  shape  of  the  signal  lies  in  the  AR 
parameters  of  the  signal -plus-noise  ARMA  process. 

In  a  companion  paper  [1]  we  presented  a  number  of  results  related  to  the 
asymptotic  accuracy  of  a  fairly  general  class  of  instrumental  variable  (IV) 
estimators,  which  includes  the  MYW  estimator  as  a  special  case.  In 
particular,  it  was  shown  that  estimation  accuracy  increases  monotonical ly  with 
the  number  of  MYW  equations  for  an  optimal  choice  of  the  weighting  matrix  used 
in  the  least  squares  solution  of  these  equations.  Furthermore,  the  asymptotic 
error  covariance  of  the  optimal  IV  method  equals  that  of  the  prediction  error 
method.  In  other  words,  the  optimal  IV  method  is  asymptotically  (as  the 
number  of  data  points  and  the  number  of  MYW  equations  tend  to  infinity) 
efficient.  An  alternative  form  of  the  optimal  IV  method  involving  pre- 
filtering  of  the  data  used  in  the  instrument  vector  while  using  a  minimal 
number  of  MYW  equations  was  also  discussed. 

The  main  difficulty  associated  with  the  optimal  IV  method  is  that  the 
optimal  weighting  matrix,  and  the  optimal  pre-filter,  depend  on  the  second- 
order  statistics  of  the  data,  which  are  not  known  a-priori.  The  objective  of 
this  paper  is  to  propose  several  multistep  algorithms  for  overcoming  this 
difficulty.  As  we  will  show,  these  algorithms  provides  asymptotically 


efficient  estimates  of  the  AR  parameters. 

The  structure  of  the  paper  is  as  follows.  In  section  2  we  briefly  review 
the  results  on  optimal  instrumental  variable  (OIV)  estimation  derived  in 
[1].  Three  approximate  implementations  of  the  OIV  are  presented  and  analyzed 
in  section  3:  one  based  on  an  optimal  weighting  matrix  and  ^wo  based  on  an 
optimal  pre-filtering  operation.  The  implementation  of  these  forms  of  the  OIV 
estimator  by  means  of  a  multi-step  procedure  is  discussed  in  section  4.  The 
performance  of  the  proposed  estimation  techniques  is  studied  in  section  5  by 
means  of  some  numerical  examples. 

The  work  presented  here  and  in  [1]  provides  an  extension  of  IV  methods 
which  are  usually  applied  to  system  identification,  to  problems  of  time  series 
analysis.  For  an  overview  of  IV  methods  and  their  applications  see  Cll]-Cl2]. 

2.  THE  OPTIMAL  IV  ESTIMATES 

Consider  the  following  ARMA  process  of  order  (na,  nc) 

A(q”My(t)  =  C(q“Me(t)  »  (1) 

where 

e(t)  =  white  noise  process  with  zero  mean  and  variance  \  , 

./  ,  “I  -na 

A(q  )  =  1  +  a^q  +  ...  +  a^^q  . 

C(q  )  =  1  +  c^q  +...•>■  c^^.q 

q“^  =  unit  delay  operator  (q”^y(t)=y(t-l))  • 

The  following  assumptions  are  made; 

Al:  A{z)  =  0  =  |z|  >  1;  C(z)  »  0  =  |z|  >  1  . 


In  other  words,  the  ARMA  representation  (1)  is  stable  and  invertible. 


This  is  not  a  restrictive  assumption  (cf.  the  spectral  factorization  theorem, 
e.g.,  [16]). 

A2:  a  ^  0,  c  f  0,  and  {A(z),C(z)}  are  coprime  polynomials, 
nd  nc 

In  other  words,  (na,  nc)  are  the  minimal  orders  of  the  ARMA  model  (1). 
Next  we  introduce  the  notation: 

<{i(t)  =  C-y(t-l ),..., -yit-na)]'*', 

9  =  (2) 

v(t)  =  C(q'he(t). 

Then  equation  (1)  can  be  rewritten  as 

y(t)  »  <)^{t)0  +  v(t)  (3) 


The  unknown  parameter  vector  9  will  be  estimated  by  minimizing  a 
quadratic  cost  function  involving  the  data  vector  .pit)  and  an  IV  vector 


z  (t)  : 
m 


9=arg  minii[  z^(t) /(t) ]9--  ^  z^(t)y( t) ’  iq, 
9  t“ 1  t*l 


where 


N  =  number  of  data  points. 


(4) 


2  a  T„ 

ix;ig  »  X  Qx  ,  Q 


a  positive  definite  matrix  , 


and 


Z  (t)  =  G(q"M 
m 


y( t-nc-l) 
y(t-nc-m) 


m  >  na  , 


(5) 
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where  G(q“^)  is  a  rational  filter.  We  assume  that 


A3:  G(q’^)  is  stable  and  invertible,  and  G(0)  =  1. 


It  is  straightforward  to  show  that  the  IV  estimate  in  equation  (4)  can  be 
obtained  by  a  least-squares  solution  of  the  following  system  of  linear 
equati ons: 

I  5  z„(t)y(t)]  .  (6) 

t— 1  t— 1 

1/2 

where  Q  '  is  a  matrix  square- root  of  the  weighting  matrix  Q  (i.e., 

T/2  1/2 

Q  *  Q  Q  )  .  The  class  of  estimates  defined  by  (6)  includes  the  various 
MYW  estimation  techniques  discussed  in  the  literature  as  special  case;  see  [1] 
for  details. 

It  was  shown  in  [1],  that  under  conditions  A1-A3,  the  covariance  of 
normalized  estimation  error  (/lT/x)(8-9)  obeys  the  inequality 


where 


Pm  >  Pm  -  [pI  Pm]'^  * 

m  m  L  in  m  mJ 


=  E{Z|^(t)/(t)}  ,  m  X  na  , 


Equality  in  (7)  can  be  shown  to  hold  for  the  "optimal"  weighting  matrix 


where 


Q  =  s. 


Furthermore,  it  was  shown  that 


^m  *  ^m+1  »  ^0'^  •  (11^ 

The  monotonical ly  non- increasing  sequence  converges  to  a  limit  denoted  by 


PEM 


This  limit  was  shown  to  equal  the  (normalized)  error  covariance  matrix 
associated  with  the  Prediction  Error  Method  (PEM).  See  [15]  for  a 


discussion  of  the  PEM  and  its  properties.  Here  we  note  only  that  PEM  is  an 


efficient  estimator,  i.e.,  Pp^^  equals  the  Cramer-Rao  lower  bound.  Thus,  the 
IV  estimator  in  (6)  is  asymptotically  efficient  if  we  set  Q  *  S’^  and  let 


m  +  •  .  The  asymptotic  error  covariance  matrix  P  does  not  depend  on 

-1  -1  * 

G(q”  )  and  we  will  usually  choose  G(q"  )  =  1  .  See  [1]  for  proofs  of  the 


statements  above. 


It  was  shown,  however,  that  the  rate  of  convergence  of  P^  is  affected  by 

-1  -1  2  -1 
the  choice  of  G(q  )  ,  see  [1].  If  we  set  G(q  )=1/C  (q  )  ,  then  we  get 


the  fastest  possible  "convergence  rate":  P^^  =  P^  for  m  >  na  .  Note  that  for 


m=na  the  matrix  Q  in  (6)  does  not  effect  the  solution  and  can  be  set  to  Q=I, 


Another  interesting  choice  for  G{q“^)  is  G(q“^)  *  A(q"^)/C^(q"M  . 


In 


2 

this  case  S„  «  x  I«  .  Thus,  the  optimal  weighting  matrix  is  Q  =  I  (the 
scaling  factor  \/\^  does  not  matter). 


To  summarize,  we  have  (at  least)  three  ways  of  generating  optimal  IV 
estimates  using  equation  (6): 


OIV-1:  Q  =  S*^  .  G(q"M  =  1  ,  m  > 


OIV-2;  Q  =  I,  G(q'M  =  l/C^(q"M  ,  m  =  na 


OIV-3:  Q  *  I.  G(q‘M»A(q"M/C^(q"M,  m  ^  - 


(12) 

(13) 

(14) 


The  problem  is  that  both  of  these  methods  depend  on  knowledge  of  unknown 
quantities.  This  is  the  usual  dileiwra  in  accuracy  ootimization.  Our  aim  here 
is  to  show  how  to  overcome  this  difficulty  for  the  case  under  consideration. 

We  will  start  by  showing  that  replacing  S_  (in  OIV-1),  C(q'^)  (in  OIV-2) 
-1  -1 

and  C(q  ),A(q  )  (in  OIV-3)  by  their  consistent  estimates,  will  not  affect 

asymptotic  estimation  accuracy.  Then  we  show  how  to  obtain  such  consistent 

estimates  of  S_  and  C(q"^)  .  The  prooosed  estimation  procedures  are 
”  -1 

therefore  based  on  estimating  or  c{q  )  and  using  these  estimates  in  (6) 


instead  of  the  true  S«  or  C(q“^)  .  As  we  will  see,  the  implementation  of 

""  -1 

OIV-1  does  not  require  explicit  computation  of  C{q  )  .  This  may  be 
advantageous  in  applications  where  only  the  AR  parameters  need  to  be 
estimated.  A  more  detailed  discussion  of  the  proposed  algorithms  will  be 
given  in  the  following  sections. 

3.  ANALYSIS  OF  THE  OPTIMAL  IV  MULTISTEP  ESTIMATORS 

In  this  sections  we  analyze  the  asymptotic  properties  of  OIV-1  and  OIV-2 
by  techniques  similar  to  those  used  in  Cl],[13]. 

3.1  APPROXIMATE  OIV-1 

Let  S  denote  a  consistent  estimate  of  S  .  Let  denote  the  OIV-1 
m  m  l 

estimate  (6), (12)  for  a  given  in  (possibly  m  =  m(N),  where  m(N)  increases 

A 

without  bound  as  N  -*•  •)  and  let  be  the  approximate  OIV-1  estimate  with  S^ 

A 

replaced  by  .  Then  we  can  state  the  following  theorem. 

Theorem  3.1 

A  ^ 

Let  assumptions  A1-A3  be  true,  and  assume  also  that  =  0(1/N)  and 

3  A  A 

that  Cm(N)]  /N  as  M  -►  «  .  Then  and  a.  are  asymptotically  equivalent. 

^  ^  A 

A  A 

We  will  say  that  two  consistent  estimates  of  a  ,  say  a^  and  a^^  ,  are 
asymptotically  equivalent  if 

®1  “  ®1  *  for  0  > 

tWe  will  use  throughout  the  paper  the  notation  0(e)  to  denote  a  random 
variable  with  standard  deviation  Ke  ,  where  e  is  small  and  where  K  is  a 
(fintie)  constant  independent  of  c  . 


From  the  assumption  above  it  fonows  that  for  sufficiently  large  N  we 


m  m 


Mote  that  the  OIV-1  solution  of  (6)  can  be  written  explicitly  as 

where 

It  is  straightforward  to  show  that 

Similarly, 

From  (15),  (18),  (19)  it  follows  that 


t»l 


=  (^9^  -  0]  +  0(m  /N) 


Since  m^/N  =  .  and  since  m^Z/fT  0  by  assumption,  it  follows 

that  the  second  term  in  (20)  goes  to  zero  as  M  +  »  ,  faster  than  the  term 
9j^  -  9  (which  is  0[mZ/N)  ,  cf.  (18)). 

The  convergence  of  p  to  P  may  be  slow,  especially  if  C(z)  has  zeros 

m  •  « 

close  to  the  unit  circle,  see  [1].  For  the  idealized  OIV-1  estimate  9^  we 
may  then  need  to  consider  a  large  m  in  order  to  obtain  good  accuracy.  For  the 

A 

practical  estimate  9^  the  situation  is,  however,  different.  If  m  is  too 

A 

A  A 

large  with  respect  to  N  then  9^  and  9j^  may  not  have  the  same  distribution 
and  thus  0^  may  not  be  (asymptotically)  optimal.  Theorem  3.1  gives  an  upper 

1  (5  ^  ^ 

bound  on  m  (m(N)=N  ,  6  >  0)  guaranteeing  that  9j^  and  9j^  are 

asymptotically  equivalent  However,  no  attempt  has  been  made  to  give  a  tight 
bound.  In  fact  this  seems  quite  difficult  since  a  tight  bound  would  be 
problem  dependent.  In  section  5  we  discuss  further  this  point  and  illustrate 
it  by  means  of  some  simulations.  It  is  shown  there  that  the  bound  of  theorem 

A 

3.1  is  quite  conservative.  That  is  to  say,  9^  and  9,  may  behave  similarly 

1  I 

1/8 

for  m  >  M  .  As  explained  earlier,  one  needs  to  consider  large  values  of  m 

when  the  convergence  of  P  to  P  is  slow. 

m  » 

3.2  APPROXIMATE  OIV-2 

A 

^  —  1  •  1  ^ 

Let  C(q  )  denote  a  consistent  estimate  of  c(q  )  •  Let  9^  be  the 

approximate  estimate  obtained  by  replacing  C(q"^)  by  C(q'^)  in  the  OIV-2 

A 

estimate  92  (6), (13).  Then  we  can  state  the  following  theorem. 

Theorem  3.2 

Let  assumptions  A1-A3  hold  true  and  assume  that 

A 

c^-c^.  *  Odz/TT),  i=l,...,nc.  Then  and  92*'*®  asymptocially  equivalent. 
Proof: 

The  OIV-2  solution  of  (6)  can  be  written  explicitly  as 


9 


(21) 


^  -11™  1 

t=l  C  (q  ) 


where 


1^1  T 

®  M  I  “5 - ?— 4'^t"'’c)  ij)  (t) 

^  t=l  c2{q‘h 


(q  M 

It  is  straightforward  to  show  that 


®2"®  =  °n^{Ki  ;2Ai\ 

t=l  C  (q  ) 


and  with 


that 


*1^1  T 

■  m"  I  “5 - r~  ii(t-nc)  ,<)  ( t) 

^  ^  t=l  c2{q-M 


^2'^  "  (f  i,  >27^;  ^(t-nc)  v(t)}  = 
t=i  C  (q  ) 


-11^'*  1 

%  IfT  I  ;:?7-.T-<»(^-nc)  v(t) 
t=l  C  Iq  ) 

t-?i  . 


(22) 


(23) 


(24) 


...o(t-2nc)]v(t)](C-C)}+0(l/N)=(92-9)+0(l/M) 


(25) 


where 


C  =  Cc^, 


A 

c 


] 


T 


Since  92-9  *  0[1//N]  it  follows  from  (25)  that  and  92  asymptotical!; 
equivalent.  Mote  that  here  the  choice  of  the  number  of  MYW  equati'ons  is  not 
an  issue,  since  optimality  is  achieved  for  m  =  na  .  However,  the 
implementation  of  the  OIV-2  estimator  requires  estimation  of  the 


C(q”M  polynomial,  while  this  can  be  avoided  when  implementing  OIV-1. 
Estimating  C(q  is  not  an  easy  task  and  one  often  wants  to  avoid  it,  if 
possible.  The  relative  advantages  and  disadvantages  of  the  two  estimators  are 
discussed  further  in  the  following  sections. 

3.3  APPROXIMATE  OIV-3 

*  “1  *  -1  -1  -1 
Let  C(q  )  and.A(q  )  denote  consistent  estimates  of  C(q  )  and  A(q  ) 

A 

respectively.  Let  9^  be  the  approximate  estimate  obtain  by  replacing 

C(q  and  A{q  M  by  C(q~M  and  A(q  M  in  the  OIV-3  estimate  9^  (6), 

(14).  Then  we  can  state  the  following  theorem. 

Theorem  3.3 


Let  assumptions  A1-A3  hold  true  and  assume  that  c.-c^.  =  0{l//‘n)  , 

A  __  A 

i=l,...,nc  and  a^-a.=0(l//n)  ,  i=l,.,.,na.  Then  and  are 
asymptotically  equivalent. 

Poof: 

Let 

X  '26) 

where 


then 


z  (t) 
m 


=  A(q~h 

C^(q"M 


y(t-nc-l) 

y(t-nc-m) 


(27) 


0{l//f 


(28) 


where  is  as  definefl  in  (5)  with  G{q“^)  =  A(q"M/C^(q"M 


Thus, 


(rJ  R^)‘^  =  {RjR^+0(m//N))"^=(RjR^)'^+0(m//Tr) 


9,-9  =  C(rIr„)'^  +  0(m//N)]CRj+O(l//F)] 
o  N  N  N 


1 

0(1/N)]  = 


[(RjRN)"^*0(m//N)][Rj  ^  I  z^(t)v(t)+0(m//N)] 

t“  1 

— 

0(m//fn 


=  (93-9)  +  0(m  /N) 


93  -  93  =  0(m^/N) 


Since  (0^.9)=  (see  (32)),  we  conclude  that  for  9^  and  9^ 

be  asymptotically  equivalent  it  is  sufficient  that 


.  -'4  •■,  ■•.  ■•.  -•.  •'  -■ 


^  i  *  ‘  I  1  a^. 


mZ/TT  +  0  as  m,N 


o» 


(34) 


2  —  2 

In  that  case  m  /N  *  (m//N)  goes  to  zero  faster  than  mZ/TT  . 

The  requirement  In  (34)  that  m//K  ♦  0  is  not  restrictive,  since  the  fact 
that  9=0(mZ/Rl  will  not  be  true  if  (34)  does  not  hold. 

The  behavior  of  P  (for  OIV-3)  as  m  increases  is  quite  different  from 
m 

its  behavior  in  the  case  of  OIV-1.  By  specializing  the  results  in  [1]  to  the 
case  of  OIV-3,  it  can  be  readily  shown  that  obeys  the  discrete-time 
Lyapunov  equation 


where 


1  *  *  *  *  ’ 


0 

•  .  1 


b  =  Y 


C(q'M 


e(t-nc-l ) } 


From  the  equations  above  we  conclude  that  the  convergence  rate  of  p  depends 

-1  -1 

on  the  zeroes  of  A(q  )  t  but  not  on  the  zeroes  of  C(q  )  •  The  reverse  is 
true  for  OIY-1;  see  [1]  for  details.  More  specifically,  iP.'Pq,!  “  ’ 

where  is  the  zero  of  A(q"M  w^Th  the  largest  modulus.  Thus,  when 

nAA  • 

A(q"^)  has  roots  closer  to  the  unit  circle  than  the  zeroes  of  C(q~^)  ,  we 

expect  P„  to  converge  faster  for  OIV-1  than  for  OIV-3  (and  vice-versa  when 

-1 

the  zeroes  of  C(q  )  a'"e  closer  to  the  unit  circle  than  the  zeroes  of 
A(q'M)  . 


4.  IMPLEMENTATION  OF  THE  OPTIMAL  IV  MULTISTEP  ESTIMATORS 


4.1  THE  OIV-1  ALGORITHM 


Let  us  denote  by  r  (t)  and  R  (x)  the  covariances  of  v(t)  and  z  (t)  , 
V  z  m 


respectively; 


l“  •  "  V"-'  •  *  •.*  "Si  •  ■’w**  •«'  ^i"  •. 


13 


,1 


J\ 


r^fr)  =  E{v(t)  v(t-T)} 


=  E(2^(t)  2^  (t-T)} 


Next  note  that. 


nc  nc 


nc  nc  nc  nc 

i=0  j=Q  ^  J  ^  T=-nc  j=0  J  J  ^  ^ 


^2  I  r  (tlR,(t) 

X  T=-nc 


(In  the  following  we  will  omit  the  facor  i/\  appearing  in  (37)  since  the  IV 
estimates  in  (6)  are  invariant  to  scaling  of  the  weighting  matrix  Q).  Hence 
we  can  consistently  estimate  the  optimal  weighting  matrix  by  S  where 


^m  *  I  jr^{r)R^iT) 
T=-nc 


where  r^Ix)  and  R^(t)  tlie  sample  covariances 


r  (t)  =  ^  I  v(t)v(t-T)  =  r  (-t) 


t=T 


Note  that  (38)  provides  a  method  for  estimating  S  without  explicit 
estimation  of  the  {c. }parameters.  To  estimate  via  (39)  we  need  to 

compute  (an  estimate  of) 


(41) 


v(t)  =  A(q“^)  y{t) 


An  alternative  way  of  computing  r^(T)  follows  from  (41); 


.  na  na  *  .  . 

*  I  I  a,  r  (T+i-j) 

i=0  j=0  ^ 


Note  that  both  (39)(41)  and  (42)  require  knowledge  of  the  {a.}  parameters. 

_i  ' 

Since  A(q  )  is  not  known  a-priori,  we  must  use  a  multistep  procedure.  We 
first  estimate  {a^}  using  equation  (6)  with  Q*I  (and  G(q“M  =  1  ).  The  will 
be  problem  dependent,  but  generally  m  will  be  considerably  larger  than  na;  see 
Cl][7].  This  gives  a  consistent,  although  not  efficient,  estimate  of  the  AR 
parameters.  These  estimates  can  now  be  used  to  compute  r  (t)  via  (39)  and 

A  ^ 

(41),  or  (42).  Next  we  compute  (29)  and  use  it  in  the  OIV-1  procedure  to 
get  the  final  (asymptotically  efficient)  estimate  of  the  AR  parameters. 

We  can  now  summarize  the  proposed  implementation  of  the  OIV-1  estimator: 

(i)  Estimate  {a^.}  by  equation  (6)  with  0=1,  G(q"h  =  1  and  m  >  na. 

(ii)  Compute  R^(t)  (31)  and  r^(T)  by  (39)  and  (41),  or  (42),  using  the 

{a.}  estimates  from  step  (i),  and  then  compute  (38). 

(iii)  Compute  the  square-root  S'  of  S"  ;  then  solve  equation  (6)  with 

I/O  ^-1/?  ^  ^ 

gi./c  =  5  tx)  Obtain  the  final  {a.}  estinates. 

m  '■  i  •' 

A 

Note  that  the  computation  of  S  via  equation  (40)  does  not  guarantee 

A 

that  S_  will  be  a  positive  definite  matrix.  It  may  happen,  therefore,  that 
*-1/2 

does  not  exist.  This  is  unlikely  to  occur  for  large  N,  but  is  quite 
likely  for  small  sample  sizes  (especially  if  C(z)  has  roots  close  to  the  unit 
circle). 


The  following  is  a  procedure  for  handling  the  case  where  is  not 
positive  definite.  Let  be  the  ordered  eigenvalues  of 

A  1  1  ^  A 


^  ^ 


>  X  ,  and  let  fv.>.  ,  be  the  corresponding 

ni  1  M  =  i 
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eigenvectors.  Let 


(43) 

X|^<e  k  =  nxl,...,m 


with  e  being  a  (small)  positive  number.  Further,  let  ^  be  the  class  of 
positive  definite  matrices  with  eigenvalues  larger  than  or  equal  to  e  . 
Then,  according  to  Lemma  A1  in  appendix  A,  the  Euclidean  distance  between 

A  >• 

S  and  the  elements  of  '8’  is  minimal  for  the  matrix  S  given  by 
m  m 

=  V.[diag(x^,...,x^,  £,...,£]].V^  (44) 


where  V  = 


1 


/e 


(45) 


in  (6)  instead  of  ,  which  may  not  exist.  Since  S  must  be  a 

fn  A  1*1 

consistent  estimate  of  z  must  go  to  zero  as  N  tends  to  infinity.  To 

guarantee  consistency  we  may  set  e  =  1/N®,  g  >  0  .  As  N  «  we  will 

•  A  O 

have  S„  ♦  S„  ,  w' ere  is  a  consistent  estimate  of  x  S„  .  Concerning  the 
m  m  ’  m  m 

choice  of  8  we  note  that  the  smaller  z  ,  the  smaller  is  the  distance  between 

A 

and  cf.  Lemma  1.1.  However,  too  small  an  z  may  lead  to  ill- 
conditioning  problems.  Thus  e  should  be  chosen  as  a  compromise  between 
accuracy  of  the  solution  and  numerical  stability.  Finally,  note  that  if  the 

A 

estimated  covariance  matrix  happens  to  have  negative  eigenvalues  then  we 
may  suspect  that  the  {a^}  estimates  obtained  in  Step  (i)  were  poor.  We  may 
then  wish  to  repeat  Steps  (ii)-(iii)  using  in  Step  (ii)  the  improved  estimates 
of  Step  ( i i i ) . 


4.2  THE  OIV-2  ALGORITHM 


The  computation  of  the  OIV-2  estimates  requires  the  estimation  of  the 
{c.}  parameters.  There  are,  of  course,  many  different  ways  in  which  this 
could  be  done.  We  consider  here  one  such  method  based  on  factorization  of  the 


MA  spectrum  C8]-[10]. 


Let  S^(z)  denote  the  spectral  density  fucntion  of  v(t),  (2),  (41).  We 

have 

SJz)  ^  r  =  X^C(2)C(z'M  (46) 

k=-nc 

where  ^^(k)  denotes  the  covariance  of  v(t)  at  lag  k  (35).  In  other  words, 
the  C(z)  polynomial  is  the  spectral  factor  of  the  spectrum  of  v(t).  This 
suggests  the  following  procedure  for  estimating  the  {c^}  parameters: 

(a)  Estimate  the  {a.}  parameters  using  (6)  with  0=1,  G(q"^)  =  1  , 

m  >  na. 

A 

(b)  Compute  the  sample  covariances  r^(k),  k=0,...,nc  ,  using  (39)  and 
(41)  or  (42). 

.  nc  .  ^ 

(c)  Perform  spectral  factorization  of  S  (z)  *  T  r  (k)z“^  to  obtain 

*  '*  ur  --  V 

{C^}  . 

Mote  that  the  sample  covariance  sequence  {r^(0) . r^(nc) ,0,0, . . . } 

is  not  guaranteed  to  be  positive  definite.  Thus,  3^(2)  n>ay  not  be 

factorizable.  This  may  happen  in  the  small  sample  case,  especially  when  C(z) 

has  roots  close  to  the  unit  circle.  However,  note  that  OIY-2  requires  an 
“2  -1  “2  -1 

estimate  of  C  (q  )  rather  than  of  C  (q  )  We  can  always  obtain  a  consistent 
'*2  -1  '2 

estimate  of  C  (q  )  by  factoring  S^iz)  ,  since 

iy(e^“)=C^(e^“)C^(e"'^“)  >  0,  for  all  u  (47) 

even  though  S^(e'^“)  may  be  negative  for  some  values  of  . 

We  can  now  summarize  the  proposed  implementation  of  OIV-2: 

A 

(i)  Estimate  C(q  )  using  the  spectral  factorization  method  described 
above.  Let  G(q*^)  =  1/C^(q"h  . 
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(I'D  Estimate  the  AR  parameters  using  equation  (6)  with  Q  =  I,  m  =  na 
«  -1 

and  G(q  )  from  step  (i). 

Mote  that  it  is  possible  to  iterate  this  procedure  by  using  the  AR 
parameters  obtained  in  step  (ii)  to  improve  the  estimate  of  C{q”M  .  by 

A 

repeating  step  (i)  (the  factorization  method)  with  the  new  {a^}  parameters. 
4.3  THE  OIV-3  ALGORITHM 


The  computation  of  the  OIV-3  estimates  is  very  similar  to  that  of  OIV- 
2.  The  only  difference  is  that  G(q"^)  =  A{q”^)/C^(q”^)  where  A{q”^)  is  the 
current  estimate  of  A(q”^)  (obtained  from  step  (a)  in  the  first  iteration  of 
the  algorithm,  or  from  the  previous  step  (ii)  in  the  case  of  re-iteration) . 


4.4  COMPUTATIONAL  REQUIREMENTS 


The  following  is  a  brief  summary  of  the  number  of  arithmetic  operations 
(i.e.,  multiplies  and  adds)  required  by  each  of  the  algorithms  described 
above. 


OIY-1: 


Step  (i):  requires  approximately  ~(m+na)N  operations  to  compute  the  sample 


covariances  and  operations  to  solve  for  the  initial  estimate  (solutions 


requiring  only  --m^  operations  are  also  possible  if  the  Toeplitz  structure  of 


the  Yule-Walker  equation  is  used).  Step  (ii):  requires  ■(na)'^.nc  operations 


to  evaluate  vAt)  using  (42),  or  ~(na+nc)N  operations  using  (39),  (41).  The 


V': 

computati on  of  S 


m 


requires  ~nc»m^  operations.  Step  (iii):  requires  -Sm' 


operations.  A  recursive  QR  algorithm  which  appears  to  be  useful  for  solving 
(6)  is  presented  in  Appendix  B. 


OlV-2: 


Step  (a):  requires  ~(m+na)N  +  m"^  operations,  as  in  the  case  of  OIV-1.  Steps 


(b):  involves  the  computation  of  ry(T)  which  requires  either  ~(na)‘^nc  or 


~(na+nc)N  operations.  Step  (c):  computational  requirements  will  depend  on 
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the  particular  factorization  technique.  Step  (ii):  requires 

3 

-2(nc+na)M  +  (na)  operations. 

OIV-3: 

Steps  (a)-(c)  --  same  as  QIY-2.  Step  (ii)  —  same  as  step  (iii)  of  OIV-1, 
with  the  addition  of  (2nc+na)M  operations  to  perform  the  pre-filteing. 

In  summary: 

3  2  3  2 

OIV-1:  {m+na)N+4m  +nc.m  (or  (m+2na+nc)N+4m  -t-ncm  ) 

3  2  3  3 

OIV-2:  (m+2na+2nc)N+m  +na  (nc+na)  (or  (nH-4na+4nc)N+m  +na  ) 

2  3 

OIV-3:  (m+2na+2nc)N+na  nc  (or  {m+3na+3nc)M+4m  ) 

Note  also  that  re- iteration  of  OIV-1  does  not  require  much  computation 
since  the  sample  covariances  need  to  be  computed  only  once.  Iteration  of  OIV- 
2  and  OIV-3  is  more  costly  since  the  data  need  to  be  refiltered  and  some 
sample  covariances  recomputed  at  each  iteration. 

5.  NUMERICAL  EXAMPLES 

In  this  section  we  present  some  selected  results  of  computer  simulations 
which  illustrate  the  behavior  of  the  OIV  algorithms  discussed  earlier.  Tables 
1-10  summarize  results  based  on  100  independent  Monte-Carlo  runs  performed  for 
each  of  the  test  cases  described  below.  Each  of  the  tables  contains  the  means 
and  standard  deviations  (as  well  as  mean-squared-errors)  of  the  AR  parameter 
estimates  obtained  by  applying  the  MYW,  OIV-1,  OIV-2  and  OIV-3  algorithms  to 
simulated  data.  The  OIV  algorithms  were  used  with  different  values  of  m  and 
iterated  three  times. 

Note  that  OIV-2  was  run  for  values  of  m  different  from  m=na.  The 
asymptotic  theory  shows  that  OIV-2  is  optimal  only  for  m=na,  not  for  m  >  na. 
However,  in  the  finite  data  case  we  found  that  increasing  m  tended  to  make  the 
algorithm  more  robust  by  reducing  the  orobability  of  singularity  of  the  matrix 
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which  needs  to  be  inverted.  In  the  first  two  cases  the  data  were  the  sum  of  a 
second  order  autoregressive  process  and  white  noise: 

y(t)  =  x{t)  +  n{t)  ,  (48) 


where 


x(t)  *  -a^x(t-l)  -  a2x(t-2)  +  w(t)  (49) 

and  where  w{t)  and  n(t)  are  mutually  uncorrelated  white  noise  processes  whose 
variances  were  chosen  to  give  the  desired  signal-to-noise  ratio 
(SNR  =  Var{x{t)}/Var{n(t)})  .  As  is  well-known,  y(t)  has  an  equivalent  ARMA 
(2,2)  representation.  The  zeroes  of  the  MA  part  can  be  shown  to  be  farther 
away  from  the  unit  circle  than  the  zeroes  of  the  AR  part.  As  the  SNR 
decreases,  the  HA  zeroes  approach  the  AR  zeroes. 

Case  1:  Narrowband,  high  SNR 

A(z)  =  1  -  1.4z"^  +  0.95z"^  ,  (zeroes  at  .975.6*’^^^*^°), 

SNR  “  20  dB,  N  *  4096 

The  MA  polynomial  of  the  equivalent  ARMA  representation  of  this  process  is 

C{z)  =  1  -  0.3155z'^  +  0.1233z’^  .  ,  (zeroes  at  .351.e*'^®^*^°) 

The  results  are  summarized  in  tables  1  and  2.  In  this  high  SNR  case  the 
experimental  results  are  very  close  to  the  asymptotic  bounds. 

Case  2:  Narrowband,  low  SNR 
As  in  case  1,  with  SNR  =  0  dB 

C(z)  =  1  -  1.20955z"^+  0.726837z"^,  (zeroes  at  .a53.e*’^^'^*®°) 

The  results  are  summarized  in  tables  3  and  4. 
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TABLE  4;  ExpcrlMiitAl  ud  Theoretical  Estlaatlon  Accuracy  Tor  Case  2,  Paraneter  a^ 
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table  S:  ExpcrIamtAt  and  niaorttlca)  Esttnatlon  Accuracy  far  Case  3.  ParaatMr  a^ 
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TABU  B:  ExptrlnMUl  antf  TheoretIcJl  CstiMtIon  Accuracy  for  Case  3,  Paraatttr  a. 
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TABLE  7:  Exp«rlMtitAl  And  Thnorctlci)  £ttlMt<on  Accuracy  for  Cast  a,  Paraattar  a 
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.006749 

Thaorttlcil  ttd. 

div. 

Iteration  1 

Mean  t  ttd. dev. 

•l.£87t.005716 

-1.6871.005296 

-1.6861.009568 

«se 

.00582 

.005359 

.009733 

OtV-1 

Iteration  2 

Mean  t  ttd. dev. 

-l.687t.a05715 

-1.6871.005325 

-1.6881.01345 

mf 

.005822 

.005387 

.01345 

Iteration  3 

Haen  t  ttd. dev. 

-t.587t.00S7l6 

-1.6871.005325 

-1.6891.01890 

•M 

.005822 

.005387 

.01890 

Theoretical  ttd. 

d«v. 

Mean  t  ttd. dev. 

-t.688S.Q0$3S4 

-1.588t.005325 

-1.6881.005607 

•1.6881.006522 

OIV-E 

Iteration  1 

■St 

.OOS397 

.005371 

.005651 

.006550 

Mean  t  std.dev. 

•1.687t.00S5S2 

•1.5881.005458 

-1.6881.00524 

•1.6881.006709 

Iteration  2 

■St 

.OOSMt 

.005512 

.005566 

.005736 

Nttll  i  StdedtVe 

•t.6a8t.30S383 

•1.6881.005635 

-1.6881.005552 

•1.6881.0066663 

Iteration  3 

■St 

O.OOS423 

.005677 

.005594 

.006690 

Theoretical  ttd. 

dtv. 

— 

“ 

— 

Mean  t  std.dev. 

- 

•1.6871.01697 

•1.6871.009324 

-1.6861.006355 

Iteration  1 

■St 

.01701 

.009392 

.006620 

Mean  t  std.dev. 

•1.6921.01869 

•1.6891.009072 

-1.6871.005980 

Iteration  2 

■St 

.01901 

.009116 

.006050 

Mean  t  std.dev. 

-1.6921.02019 

-1.6891.009854 

•1.6871.006132 

Iteration  3 

■St 

.02046 

.009880 

.006221 

Theoretical  ttd.( 

Irr. 

1 

1 

1 

J 

1 

• 

1 

1 

1 

! 

! 
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.  w.  X  Ax  «  t.  A  >  I^.V-  S^. 


a">V 


4 


1 

1 


> /r-  «■- 


TASIE  3;  Emcrluental  and  Theoretical  Estlnatlon  Accuracy  For  Cate  a,  Paraneter  a^ 


m 

2 

4 

10 

40 

MTW 

Mean  t  ltd. dev. 

.94881.007186 

.94871.006604 

.94881.006789 

.94911.007339 

mse 

.007325 

.006738 

.006885 

.007399 

Theoretical  ltd. dev. 

oiv-i 

Iteration  1 

Heart  t  std.dev. 

.94871.006703 

.94881.006391 

.94671.01466 

fflse 

.006836 

.006499 

.01503 

Iteration  ? 

Mean  t  std.dev. 

.94871.006691 

.94881.006373 

.94971.01053 

me 

.006825 

.006480 

.01053 

Iteration  3 

Mean  ;  std.dev. 

.94871.006674 

.94881.006336 

.94811.01965 

mse 

.006808 

.006445 

.01974 

Theoretical  st4.4ev. 

I3IV-2 

1 

Iteration  1 

Mean  *  sed^dev. 

.9489*.  006385 

.94891.006409 

.94911.006616 

.94921.007355 

A$e 

.006478 

.006498 

.006684 

.007395 

Iteration  Z 

Mean  t  ltd. dev. 

.94891.006445 

.94891.006415 

.94911.006593 

.94921.007387 

Ate 

.006544 

. 

.00650 

.006658 

.007426 

Iteration  3 

Mean  t  ltd. dev. 

.94901.006415 

.94891.006397 

.94911.006622 

.94921.007387 

Ate 

.006500 

.006483 

.006687 

.007426 

Theoretical  ltd. 

dev. 

- 

Iteration  1 

Mean  t  ltd. dev. 

.94801.02086 

.94831.01044 

.94791.006935 

Ate 

.02095 

.01057 

.007243 

OlV-3 

Mean  t  std.dev. 

.95311.02286 

.9506101010 

.94891.006760 

Iteration  Z 

Ate 

.02307 

.01011 

.006855 

Iteration  3 

Mean  t  ttd.dev. 

.95361.02681 

.95061.01087 

.94881.006845 

Ate 

.02705 

.01089 

.006950 

Theoretical  ftd.dev.  j  |  j 

. J 
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bi -d1 agonal ization  and  a  QR  algorithm.  Solutions  corresponding  to  singular 
values  less  than  10"®  times  the  largest  singualr  value  were  set  to  zero  {in 
effect  decreasing  the  assumed  rank  of  the  MYW  equations  and  producing  the 
minimum-norm  solution  of  the  underdetermined  set  of  equations). 

OIV-1:  The  algorithm  was  implemented  as  described  in  section  4  and  appears  to 
be  quite  robust.  Since  our  simulations  involved  relatively  long  data  records 

A 

we  did  not  encounter  problems  with  S^^  being  non-positive  definite.  Thus,  we 

did  not  have  to  use  the  procedure  described  in  (43)-(45).  In  fact  we  computed 
*-1/2 

S  by  the  Levinson-Ourbin  algorithm,  applied  to  the  first  column  of 

^  '  A 

Sjjj  .  We  used  equations  (39) (41)  to  estimate  r^fr) 

OIV-2:  The  factorization  of  S  (z)  was  oerformed  by  computing  the  roots 

of  z  S^lz)  .  All  the  roots  outside  the  unit  circle  were  reflected  inside 

the  unit  circle,  and  the  complete  set  of  roots  was  then  used  to  compute 

C  (z)  .  In  this  case  we  noticed  that  the  filtering  operation  (by 
•  1  ^0  1 

G(q  *•)  *  1/C  (q"*))  introduced  a  transient  which  needed  to  be  eliminated. 

To  limit  the  duration  of  the  transient  we  "contracted"  the  roots  of  the 

polynomial  C^(z)  by  replacing  C^{z)=»l+c^z“^+. . .+cJ^j.z"^”‘'  by 

C^(z/n)=l+c|nz"^+.  ..+C2^^n^'^^2”^”‘'  ,  where  n  =  0.99  .  By  construction,  the 

roots  of  C  (z)  have  maximum  modulus  of  1.  To  eliminate  the  effects  of 
transients  in  {G{q"hy(t))  ,  the  first  200  samples  of  the  filtered  data  were 
discarded. 

01 V -3:  Implementation  was  very  similar  to  OIV-2. 

6.  CONCLUSIONS 

We  presented  several  multi  step  implementations  of  optimal  instrumental 
variable  algorthms  for  estimating  the  AR  parameters  of  an  ARMA  process.  These 
algorithms  were  shown  to  provide  asymptotically  efficient  estimates  of  the  AR 
parameters  at  a  modest  computational  cost,  compared  to  methods  such  as  the 
Maximum  Likelihood  Estimator.  The  OIV  algorithms  are  useful  in  situations 
where  the  MYW  method  does  not  provide  accurate  estimates  (e.g.,  for  ARMA 
processes  with  zeroes  near  the  unit  circle).  The  performance  of  the  oroposed 
algorithms  was  illustrated  by  selected  numerical  examples. 
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APPENDIX  A.  THE  BEST  POSITIVE-DEFINITE  APPROXIMATION  OF 

A  SYMMETRIC  MATRIX 


Let  A  be  a  mxm  symmetric  matrix.  Let  x,  >  X.,  >  ...  X  be  its 

i  c  m 

eigenvalues  and  v, ,...,v  be  the  corresponding  eigenvectors.  We  have  the 
1  m 

following  result,  which  is  a  slight  modification  of  a  similar  result  given  in 


Lemma  Al.  Let  be  the  class  of  positive  definite  mxm  matrices  with 
eigenvalues  larger  than  or  equal  to  a  given  (small)  positive  number  e  . 
Then 


inf.A-B,  =  *  ...  ♦  (A.U 

Be 

where  jAk  =  [trAA^]^*^^  =  Cz  E  denotes  the  Euclidean  norm,  and 

^n+l’‘"’^m  eigenvililies  of  A  that  are  smaller  than  e  ,  that  is 

X|^  >  e  k*l, . . .  ,n 

(A.2) 

X,  <  e  k»n+l, . . .  ,m 

k 


Furthermore,  the  infimum  is  attained  for 


Proof:  We  have 


m 


m 


where  is  the  i,j-e1ement  of* 


*  I  I  "ii 

i,j=l 
i^j 


C  =  V  B  V 


Clearly  C  has  the  same  eigenvalues  as  B.  Thus  we  can  write 


2  ^  rt  m  7  ^  7 

llA-Bil  >  '  >  l  (X.-C.^)  >  l  [X^--ej 

1=1  "  i=n+l  '  "  i=n+l 


where  the  equalities  hold  if 

Ci.=0  i*j;  c^.=x^  i=l.....n  c..=e  i=nn,....m 

By  inserting  (A. 5)  in  (A. 4)  we  readily  obtain  (A. 3). 

APPENDIX  B:  A  RECURSIVE  QR  ALGORITHM  FOR  SOLVING  (6) 
Let  us  rewrite  equation  (6)  as 


where 


Let  L  be  factored  as 
m 


L  =  0  T 
ID  mm 


where 


An  orthogonal  matrix 
An  upper  triangular  matrix 


Then  can  be  computed  by  back-substitution  from 


T  6  *  0  i 
m'’m  m  *’m 


Consider  now  the  situation  for  (hh-I).  Determine  first  o  and  3  in 


and  then 


*-1/2 

I  =  c  0  s 

^m+1  ''m+l  ^m+1  T 

Y 


We  have 


On,  0  Tm 
m  m 


0  1 


So  the  problem  of  factorizing  L  .  reduces  to  the  factorization  of 

nH’i 


w"*  ■*"  '«*•  I 


•  1,^  k  "  -  '  «'*•  n  *  *  V  ‘  '•  S  ^ 


In  this  last  matrix  only  the  last  row  y  needed  to  be  made  zero.  The 
computations  needed  are  clearly  simpler  than  if  the  matrix  would  have  been 
full.  Let  be  an  orthogonal  matrix  such  that 


Vl  T  "  triangular  ^ 


(B.ll 


in 


0  1 


®nH-l  ^m+1  ”  ^m+l^im-1  . 


(B.12 


Finally,  we  have 


vi  •  I 

t—l  O 


(B.13 


The  estimate  9_^,  is  computed  from 
nH*! 


^m+l®m+l"^m+l  Wl“  ^m+1 


ol  0 


0  1 


(B.u; 


^m+l®m+l  ”  ^m+l 


0^  t 

m 


(B.15) 


AN  APPROXIMATE  MAXIMUM  LIKELIHOOD  APPROACH  TO 
ARMA  SPECTRAL  ESTIMATION 


Petre  Stoica,  Benjamin  Fried! ander  and  Torsten  Soderstrom 


ABSTRACT 


A  three-step  approximate  maximum  likelihood  method  for  ARMA  spectral 
estimation  is  derived,  based  on  an  idea  due  to  Walker.  The  asymptotic 
properties  of  the  proposed  estimator  are  investigated  and  an  explicit 
expression  for  its  asymptotic  covariance  matrix  is  presented.  The  estimator 
provides  the  asymptotic  accuracy  of  a  maximum  likelihood  technique,  at  a 
modest  computational  cost. 


The  work  of  B.  Friedlander  was  supported  by  the  Army  Research  Office  under 
contract  No.  DAAG29-83-C-0027.  P.  Stoica  is  with  Facultatea  de  Automatica, 
Institutul  Politehnic  Bucuresti,  Splaiul  Independentei  313,  Sector  6,  R-77  206 
Bucharest,  Romania.  B.  Friedlander  is  with  Systems  Control  Technology,  Inc., 
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1.  INTRODUCTION 


Autoregressive  moving-average  (ARMA)  spectral  estimation  is  a  topic  of 
considerable  interest  in  engineering,  econometrics,  biometrics,  statistics  and 
other  areas  [1]-C8],  C161-C22].  Many  different  methods  have  been  proposed  for 
estimating  the  ARMA  spectrum,  including:  (i)  Optimization-based  procedures 
such  as  the  maximum  likelihood  (ML)  method  and  various  nonlinear  least-squares 
techniques  Cl], [3], [4], [8] ,[21], [22].  These  methods  tend  to  be 
computationally  intensive  and  have  inherent  difficulties  due  to  possible 
convergence  to  local  minima,  (ii)  Techniques  based  on  the  Yule-Walker  method 
and  its  many  variations  [l],[5]-[7],[12l,[13],  [23]-[25].  These  methods 
involve  the  solution  of  a  linear  set  of  equations  and  do  not  suffer  from 
convergence  to  false  minima.  However,  the  accuracy  of  the  estimates  may  be 
poor  unless  special  measures  are  taken,  such  as  increasing  the  number  of 
equations  [7], [12],  increasing  the  order  of  the  model  [5]-[7]  or  choosing  an 
optimal  weighting  matrix  [23]-[25]. 

In  this  paper  we  develop  an  estimation  technique  which  combines  the 
computational  simplicity  of  the  Yule-Walker  based  methods  with  the  accuracy  of 
ML  techniques.  The  proposed  estimator  is  based  on  an  idea  due  to  Walker  [10], 
[11],  involving  large-sample  approximate  ML  estimation  of  the  covariances  of 
the  observed  ARMA  process.  These  covariance  estimates  are  then  used  in  a 
Yule-Walker  based  procedure  to  obtain  approximate  ft  estimates  of  the  ARMA 
spectral  parameters. 

The  spectral  estimation  method  proposed  here  is  more  general  than  the 
related  method  in  [11]  (Set?  also  [9],  06],  [28]).  Walker  considered  the 
estimation  of  correlation  coefficients  instead  of  covariances  and  his  results 
can  not  be  used  in  a  straightforward  manner  for  ARMA  spectrum  estimation.  We 
introduce  here  an  estimation  technique  based  on  a  maximum  likelihood  approach 
similar  to,  but  simpler  than,  the  approach  used  in  [11]  (see  also  [33]).  A 
large-sample  ML  method  is  introduce<t^nd  its  accuracy  properties  are 
established  in  a  general  setting.  This  general  analysis  is  believed  to  be 
interesting  in  its  own  right,  and  could  be  used  to  obtain  large-sample  ML 
estimates  for  various  estiation  problem  besides  the  one  considered  here  (see 
[25]).  The  ARMA  spectral  estimator  derived  here  is  shown  to  be  asymptotically 
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efficient.  The  proof  of  its  efficiency  is  a  key  contribution  of  this  paper. 

The  outline  of  the  paper  is  as  follows:  In  section  2  we  present  the 
spectral  model  considered  in  this  paper  and  discuss  some  alternative 
parametrizations.  A  large  sample  approximate  solution  to  a  general  maximum 
likelihood  estimation  problem  is  derived  in  section  3  and  its  accuracy 
properties  are  discussed.  In  section  4  we  specialize  this  approximate  ML 
approach  to  the  ARMA  spectral  estimation  problem.  A  specific  estimation 
algorithm  is  proposed.  The  asymptotic  accuracy  prooerties  of  the  proposed 
estimator  are  discussed  in  section  5,  and  its  asymptotic  error  covariance  is 
compared  to  the  Cramer-Rao  lower  bound  in  section  6. 

2.  THE  SPECTRAL  MODEL 

Consider  the  following  ARMA  process  of  order  (na.nc) 

A(q‘M  y(t)  =  C{q‘h  e(t)  ,  (2.1) 


jjJ  where 

2 

e(t)  *  white  noise  process  with  zero  mean  and  variance  x  . 

(q  V(t)=y(t-1)). 

The  following  standard  assumptions  are  made: 

Al:  A(z)  .  C(z)  =  0  =>  lz|  >  1 


I 


I 


^5 


-1.  ,  -1  -na 

A(q  )  »  1  +  a^^q  *1 

o/  "In  ,  -1  "'’C 

C(q  )  .  1  .Cjq  .....  c^q  , 


-1 


unit  delay  operator 


In  other  words,  the  ARMA  representation  (2.1)  is  stable  and  invertible.  This 

is  not  a  restrictive  assumption,  cf.  the  spectral  factorization  theorem 

^  [29].  We  note,  however,  that  there  are  some  cases  of  interest  where  Al  does 

t,  ''Ot  hold.  For  example,  the  sinusoids-in-noise  process  can  be  described  by  an 

ARMA  model  (2.1)  with  A(z)  *  C(z)  and  A(z)  =  0  =>  |zl  =  1  ,  [l]-[3].  As  we 
■C' 

2 

I 


shall  explain  later,  the  method  of  this  paper  does  not  extend  to  such 
“degenerate"  ARMA  processes. 

A2:  a  •  c  *  0  and  {A{z),  C{z)}  are  coprime  polynomials, 

na  nc 

In  other  words,  (na,nc)  are  the  minimal  orders  of  the  ARMA  model  (2.1).  In 
the  following  we  assume  for  simplicily  that  (na,nc)  are  given. 

Next  we  introduce  the  following  notation: 

rj^  =  E{y(t)  y(t-k)}  =  the  covariance  of  y(t)  at  lag  k,  (2.2a) 

<fr(z)  =  I  r.z  -  the  spectral  density  of  y(t).  (2.2b) 

In  (2.2)  £{•}  denotes  the  expectation  operator  and  z  is  a  complex  variable. 

It  is  well  known  that 

^(z)  «  .  (2.3) 

A(z)A(z  ) 

2 

Thus,  <(i(z)  could  be  parametrized  via  {a.},  {Cj}  and  x  «  The  statistically 
efficient  estimation  of  these  parameters  is  not  an  easy  task  (even  though 
asymptotically  efficient  estimates  of  {a.}  can  be  obtained  by  using  only 
linear  operations  [23]). 


In  this  paper  we  parametrize  .^(z)  by  the  covariances 
(rj^  ,  k»0, ...,na+nc}  .  These  covariances  uniquely  define  ^(z)  . 

sequence  {rj^}  satisfies  the  well-known  Yule-Walker  equations: 


The 


r,  +  a,r.  ,  +  . . .+  a  r,  =  0  ,  k  >  nc  +  1. 
k  1  k-1  na  k-na 


(2.4) 


Introduce  the  notation 

4  r<*l  -"1  ^  *  /  --1 


na  na 


(2.5) 


b^  ^  E{A(q“My(t)  A(q"My(t-k)}  =  ^  ^  a^a^  , 

where  a.  *  1  .  It  then  readily  follows  from  (2.1),  (2.3)  and  (2.5)  that 


,(z) 


A(2)A(z'M 


(2.6) 


Note  that  the  numerator  of  (2.6)  is  a  function  of  {rj^,  k»0 . na+nc}  .  Next 

note  that  the  coefficients  {a^,  i=l,...,na}  can  be  uniquely  determined  from 
(r|^,  k=0,..., na+nc}  by  using  (2.4).  This  is  possible  since  under  the 
assumptions  A1  and  A2  the  matrix 


R  A  nc+1 


.  nc+l-na 


nc+2-na  , 


r  ,  .  .  .  r 

nc+na-1  nc 


(2.7) 


arising  from  the  system  of  equations  (2.4),  is  nonsingular  [11], [23].  This 
concludes  the  proof  that  ^(z)  can  be  uniquely  parametrized  by  the  set  of 
covariances 


9  =  [i^rt,  r,  ,...,  r  ]  . 
0’  1  ’  ’  na+nc 


(2.8) 


Another  parametrization  of  .^(z)  was  considered  by  Walker  [11]  and  Cadzow 
[12].  Walker  parametrized  ^(z)  by 


(V  ''i' 


’  ■'nc' 


.,  a  ] 


(2.9) 


For  na  <  2nc  +  1  it  can  be  easily  shown  that  ^(z)  can  be  expressed  as  a 
function  of  the  parameters  in  (2.9).  For  na  >  2nc  +  1,  however,  this  is  no 
longer  that  obvious.  Walker  [11]  gave  a  formula  expressing  ^(z)  as  a 
function  of  (2.9),  which  appears  to  be  in  error.  The  simplicity  of  the 
parametrization  of  i;i(z)  via  (2.8)  was  one  of  the  reasons  for  preferring  (2.8) 
to  (2.9). 


The  parametrization  of  {)(z)  used  by  Cadzow  [12]  (see  also  [13])  is  shown 
in  [33,  Appendix  0]  to  be  a  special  case  of  (2.9),  and  is  valid  only  for 
nc  >  na  (compared  to  the  constraint  na  <  2nc  +  1  mentioned  above).  Due  to 


4 


this  constraint  it  cannot  be  used  for  arbitrary  ARMA  processes. 


Finally  note  that  replacing  (rj^,  k=0, . . . ,na+nc}  in  (2.4)  -  (2.6)  by  some 
(consistent)  estimate  will  produce  a  (consistent)  estimate  of  the  spectral 
density,  which  is  not  guaranteed  to  be  nonnegative  on  the  unit  circle.  The 
same  is  true  when  using  (2.9)  to  parametrize  ij)(z)  .  This  problem  is 
discussed  In  more  detail  in  C30][31],  where  a  remedy  is  proposed. 

3.  A  MAXIMUM  LIKELIHOOD  ESTIMATION  PROBLEM  AND  ITS  LARGE-SAMPLE  SOLUTION 

In  the  next  section  we  will  discuss  an  approximate  ML  method  for 
estimating  the  covariances  (r^^  ,  k=0  ,...,  na+nc}  characterizing  the  ARMA 
process  (2.1).  In  this  section  we  present  in  a  general  setting  the  basic 
ideas  behind  that  method.  As  was  mentioned  earlier,  our  approach  follows  that 
of  Walker  [11],  who  parametrized  the  ARMA  process  via  (2.9).  To  estimate  these 
parameters  (more  precisely,  the  parameters  {rj^/rg,...,r^^/r^,a^,...,a^^}) 
he  considered  a  more  complicated  approach  than  the  one  presented  here.  We 
formalized  the  basic  ideas  behind  Walker's  aporoach  in  [33,  Appendix  E].  We 
note  that  the  approaches  presented  in  this  section  and  in  [33,  Appendix  E]  may 
be  useful  in  deriving  new  estimators  for  other  estimation  problems  besides  the 
one  treated  here,  see  e.g.  [25]. 

Let  X  be  a  random  m-vector  which  is  completely  determined  from  the 
available  N  data  samples.  Let  e  denote  the  ne-vector  of  unknown  parameters 
to  be  estimated.  Assume  that  for  N  -►  »  the  distribution  of  X  is  completely 
determined  by  9  .  Furthermore,  assume  that 

rft  (x-x)  .no,^)  ,  (3.1a) 


where 


(3.1b) 


and  where  the  covariance  matrix  W  (assumed  to  be  nonsingular)  may  depend  on 
e  .  Finally,  assume  that  an  estimate  w  of  W,  which  is  such  that 


|w-m1  *  0(1//^]^  ,  can  be  calculated  from  the  available  data.  Under  these 
conditions  we  will  derive  a  simple  large  sample  approximate  ffl.  estimate  of 

9  . 


Since  we  consider  the  large-sample  case,  assumption  (3.1a)  is  not  too 
restrictive.  Many  statistics  have  an  asymptotically  Gaussian  distribution 
according  to  various  central  limit  theorems.  The  choice  of  X  so  as  to  fulfil 
{3.1b)  is  the  critical  point  in  applying  the  approach  of  this  section  to  a 
specific  estimation  problem. 


The  asymptotic  log-likelihood  function  of  X  is  given  by 

L(9)  =  -  I  ln2ir  -  j  In  det  W  -  j  (X-7)'''w‘^(X-7)  .  (3.2 

The  ML  estimate  of  9  obtained  from  a  "sample"  X  drawn  from  the  asymptotic 
distribution  (3.1)  is,  therefore,  the  solution  of  the  following  equation 

—  ^  — 

i£-(x-r) 


•-1  *  °  ■ 

[X-X]'  -2^  (X-X) 

ae 

n9 


(3.3) 


where  9.  is  the  i-th  component  of  0  ,  and  where  I  denotes  the 
1  ne 

ne  X  ne  unity  matrix.  Let  us  assume  that  (3.3)  has  a  solution  with  respect 

to  9  ,  say  9  .  Under  certain  regularity  conditions  the  ML  estimate  is 

consistent  and  |e  -  9I  »  0(l//ir)  .  (We  denote  both  the  true  and  the  unknown 


tWe  will  use  throughout  the  paper  the  notation  0(e)  to  denote  a  random 
variable  with  standard  deviation  Ke,  where  e  is  sufficiently  small  and  where 

A 

K  is  a  (finite)  constant  independent  of  e  .  An  estimate  W  satisfying 
!w-w|  =  Od/y'lD  is  sometimes  called  "root  N  consistent." 


parameter  vectors  by  the  same  symbol  a  ).  Determination  of  a  will  in 
general  be  a  highly  intractable  problem.  In  the  following  we  derive  an 
approximation  of  order  1/N  of  a  .  For  simplicity  we  assume  that  m  <  »  . 
However,  similar  results  hold  if  m  »  and  m/N  ->•  0  (as  m,M  -*•  »)  at  an 
"appropriate  rate."  The  rate  at  which  m  should  tend  to  infinity  is  not  easy 
to  determine,  and  will  be  problem  dependent  [23]-C24]C37]. 

For  N  large  enough  it  follows  from  (3.1)  that 


9  /  \  9-9 


Thus  we  can  rewrite  (3.3)  as. 


1  3L(6)  _  r  f  nlil*!  w 

Trie—-  ^ 


-R  . 

Loj 


0(1/N)  =  0 


(3.4) 


Next  we  partition  W  and  X  as 


11  12 
'^IZ  '^22 


W  -]  }n9 


(3.5) 


A  standard  result  on  the  inverse  of  partitioned  matrices  gives 


,-i  n 


^^22  ,.-1  yT  ^'''ir'^12’^22'''l2^  '''iz'^ZZ^ 

/<22  '^iz 


(3.6) 


It  follows  from  (3.4)- (3. 6)  that  an  asymptotic  approximation  of  order  1/N  of 
9  is  given  by 
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(3.7) 


-  X  -  ^^2^22^  • 

Now  ^  is  not  directly  imp! emen table  since  '**22  generally  depend 

on  9  .  However,  since  z  =  0(1//Jr  )  ,  see  (3.1),  we  can  replace  W  in 

A  Ij 

(3.7)  by  their  consistent  estimates  without  affecting  the  order  of  the 
approximation.  We  can  summarize  the  discussion  above  by  the  following  lemma. 

Lemma  3.1 


Consider  9  given  by 

9  =  X  -  Wj^2'**22^  (3.8) 

A  A  ^ 

where  x,  Wj2*  '**22  ^  defined  by  (3.5).  Then  9  is  a  simple  large- 

sample  approximate  (of  order  1/N)  solution  of  (3.3). 


Since  I9-9I  »  0(1/N],  9  has  for  n  •  the  same  distribution  as  the  >1. 
estimate  9  .  In  view  of  the  asymptotic  (for  m  •*>  «  )  efficiency  of  the  ML 
estimate  9  we  expect  that  under  certain  regular! conditions  the  covariance 

A  ^ 

matrix  of  the  distribution  of  9  will  tend  to  the  Cramer-Rao  lower  bound  as 
m  ♦  «  .  However  this  is  only  a  conjecture.  To  prove  it  in  specific  cases  is 
a  challenging  problem;  see  section  6  for  the  analysis  of  a  particular  case. 


In  the  following  we  establish  some  general  accuracy  properties  of  9  . 
It  follows  from  (3.1),  (3.7)  and  (3.8)  that 

/F  (9-9)  illi^^#'(0,  P„)  .  (3.9a) 

N  ■*■  «»  m 

where 


P 


m 


W11-W12W22 


{3.9b) 


and  where,  for  the  convenience  of  the  discussion,  we  stress  by  notation  the 
dependence  of  the  covariance  matrix  (3.9b)  on  m. 

A 

The  estimation  error  (9-9)  can  be  interpreted  as  being  the  residuals  of 
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the  asymptotic  regression  of  x-e  on  z.  Consider  the  following  regression 

A 

problem:  determine  M  such  that 

Q{M)  =  NE{[(x-9)-M2][(x-9)-M2]^}  >  Q(M)  ,  (3.10) 

for  any  ne  x  (m-ne)  matrix  M.  Since  we  have 

and  since  W^^-W^2'(22'^12  *^22  definite  matrices,  it  follows 

that 

M  =  Wj^2“22  ’ 

and 

Q(M)  =  P„  ,  (3.13) 

m 

According  to  the  interpretation  above  we  expect  that  the  accuracy  will 
increase  when  m  increases  since  the  number  of  "degrees  of  freedom"  in  the 
regression  problem  (3.10)  increases  with  m.  It  can  in  fact  be  shown  by 
straightforward  algebraic  calculations  that 

P„  >  Pr  ,  for  m  >  m  .  (3.14) 

m  m 

We  state  this  result  in  the  following  theorem. 

Theorem  3.1.  Consider  the  matrices  P_  and  Pr  defined  by  (3.9b)  under  the 

ni  in 

assumption  that  is  invertible.  Assume  that 

where  z  corresponds  to  P_  and  z  to  P~  .  Then  the  order  relation  (3.14) 

m 

holds  true. 

Proof :  The  nested  structure  (3.15)  induces  a  similar  structure  on  the 
matrices  Wj2  A^d  W22»  say 


^12  ['^12* 


P_  =  '^ii"Wi2  ^22  ^12  ~  ''*11  "  ^’'*12*  * 

m 

I  ^  u-lfT  m  *  ^'*22^2]  crcVl 

•  j  Q  W22[I,  0]  +  -I  .  I  ^1^2^22 


1PI2 

-1 ,  si 


■  (^r'^u  “22  ^2)^(^i  ■  “12  “22  ^2)  ■ 


(3.16) 


where 

S  .  (S3  .  slw'l  Sj)-'  . 

Since  W22  is  positive  definite  by  assumption,  S  must  also  be  positive 
definite  and  the  assertion  of  the  theorem  follows.  We  note  from  (3.16)  that 
the  equal  i ty 


is  equivalent  to 


P-’  Pm  • 

m 


(3.17) 


$1  =  W^2  ^'*22  ^2' 


(3.18) 


Since  P  >  P  ,  >  0  for  all  m,  it  follows  that  the  sequence  of  matrices 
m  m+1 

{P  }  will  have  a  limit  when  n  ♦  •  ,  which  we  denote  by  P  .  According  to 
m  « 

the  interpretation  (3.13)  of  P  the  "rate  of  convergence"  of  P_  to  P  will 

oa 

be  faster  than  that  of  the  covariance  matrix  corresponding  to  any  other 


estimator  of  9  of  the  form  x  +  Mz,  for  some  matrix  M.  Also,  according  to  the 
discussion  following  (3.8)  we  expect  that 


P  =  .  (3.19) 

«  CR 

where  denotes  the  asymptotic  Cramer-Rao  lower  bound  for  consistent 
estimators  of  e  .  This  conjecture  is  analyzed  for  the  specific  case  of  the 
ARMA  estimation  problem  in  section  6. 

Finally,  note  that  the  (consistent)  estimate  a  .  (3.8),  could  be 
introduced  independently  of  the  ML  interpretation.  For  example,  it  could  be 
introduced  using  the  (asymptotic)  regression  interpretation  (3.10)-(3.13) . 

The  accuracy  properties  proven  above  ((3.9) ,(3.13)  and  (3.14))  do  not  depend 

A 

on  the  ML  interpretation  of  9  •  The  property  (3.19)  becomes,  however, 
apparent  only  in  relation  to  such  an  interpretation.  Yet  this  property  has  to 
be  proven  in  each  particular  case  being  considered.  The  apolication  of  the 
maximum  likelihood  principle  in  this  section  is  non-standard.  The  likelihood 
function  used  here  is  valid  only  for  N  ♦  •  .  Moreover,  it  is  not  known 
whether  the  likelihood  function  is  valid  for  m  .  Thus,  we  can  not  rely 
on  the  standard  properties  of  the  ML  estimate  to  prove  (3.19).  Note  that  for 
the  specific  ARMA  problem  considered  in  the  next  section  we  show  that  (3.19) 
holds  for  Gaussian  data,  but  not  necessarily  for  other  distributions.  In  view 
of  the  discussion  above,  this  should  not  be  viewed  as  a  contradiction  to  the 

A 

ML-based  interpretation  of  9  . 

4.  LARGE-SAMPLE  MAXIMUM  LIKELIHOOD  ARMA  SPEITTRAL  ESTIMATION 


In  this  section  we  consider  the  specific  problem  of  estimating  the 
spectral  density  of  an  ARMA  process  (2.1).  This  problem  reduces  to  estimating 
the  covariance  parameters  9  »  [rn,  .....  r„..„]  of  the  ARMA  process,  see 

U  n«^TiC 

(2.4)  -  (2.6).  We  will  use  the  approximate  ML  approach  of  the  previous 
section  to  estimate  9  from  a  sample  {y(l),...,y(N)}  .  We  define  the 
unbiased  sample  covariances 


,  N-k 

\  ‘  hT  I  y(t+k)  ,  k  »  0,1,2,..., 


r 


-k 


11 


(4.1) 


Next  we  introduce  a  consistent  estimate  of  the  AR  parameters  {a^}  obtained  by 
the  least-squares  solution  of  the  following  overdetermined  Yule-Walker  system 
of  equations 


-  •  -  T 

where  a  =  [a^  ...  a  ]  .  That  a  given  by  (4.2)  is  a  consistent  estimate  of 

1  yoa 

a  *  [a,,  ....  a„.]  follows  readily  from  (2.4)  and  from  the  convergence  of 
the  sample  covariances  to  the  theoretical  covariances  {r^^}  [14], [23]. 

Note  that  the  sample  covariance  matrix  in  {4.2)  has  full  rank,  at  least  for 
sufficiently  large  M  [23].  It  may  be  advisable  to  take  K  in  (4.2)  to  be  much 
larger  than  na+nc  in  order  to  improve  the  accuracy  of  a  [7].  It  is  not 
generally  true  that  increasing  K  improves  the  accuracy  of  a  [23].  However, 
extensive  simulations  [7], [12]  have  shown  that  this  is  in  general  the  case 
when  the  sequence  {|r|^l}  is  decreasing  slowly. 

Next  we  define  the  statistic  X  which  will  constitute  the  "data"  for  our 
ML  estimation  problem: 


where 

x^  »  r^  ,  i“0, . . . , na+nc  , 

na  na  (4.3b) 

^  “  4o  ■  "•! . .  • 

We  assume  that  m  >  na  +  nc  +  1.  The  specific  form  above  of  the  vector  z  leads 
to  a  relatively  simple  expression  for  the  covariance  matrix  of  the  asymptotic 
distribution  of  X  (see  below).  Other  choices  of  z  are  possible  but  we  believe 


{4.3b)  is  the  most  convenient  choice.  This  choice  of  z  was  introduced  by 
Walker  [11]. 

It  is  shown  in  Appendix  A  that  X  in  (4.3)  is  asymptotically  normally 
distributed,  i.e.. 


where 


^  (X-X)  W)  . 


X  =  f  4 

na+nc 


(4.4a) 


(4.4b) 


11  12 


’^12  '''22 


W  “1  }  ne 


,  ne  -  na  +  nc  +  1  .  , 


[Wiilij  *  E(Cv(t-i)  +v{t+i)]v(t-j )}  i,j  «  0,...,n9-l  , 

v(t)  -  X  e(t), 

ANq'M 


^'^22^ii  ’  E{A^(q”Mv(t-i  )A^{q”Mv{t-j  )} 


(4.4c) 


nc  (4.4d) 

»  the  coefficient  of  z  '*  in  [  ^  b,^  z  i,j*l . m-ne 

k»-nc 


.2,-1 


^'^12^jk“  )[v(t+na+nc-j  )  +  v(t+na+nc+j  )]v(t-k)} 

“  ^  j  *  J“O>****^0*lf  1 1  •  •  •  y  j 


(4.4e) 


a  =  the  coefficient  of  z^  in  the  long  division  of  ^“''’C+na)  Jc^'j.c - 


A^(z'M 


It  is  not  difficult  to  see  that 


This  implies  that  ^  ^  ^  ^nc.  Note  also  that  W22  is  a  banded 

Toeplitz  matrix  with  the  band  width  equal  to  2nc  +  1. 

In  (4.4d)  and  (4.4e)  we  have  indicated  simple  ways  for  evaluating  the 
covariance  matrices  Wi2  and  W22-  Note  that  only  these  two  matrices  are  of 
interest  in  calculating  the  estimate,  cf.  (3.7).  The  matrices 
W^2  and  W22  depend  only  on  {a.,  i*l,  ...,  na}  and 
{r^^,  k=0,...,na+nc}  .Thus,  consistent  estimates  of  ^^22  ***” 

obtained  by  using  in  (4.4)  the  consistent  estimates  of  {a^, }  and  {rj^}  given 
by  (4.1)  and  (4.2). 


It  follows  from  the  discussion  above  that  X  (4.3),  satisfies  the  basic 
conditions  used  to  develop  the  approximate  ML  approach  of  section  3.  Thus,  a 
large-sample  approximation  of  the  ML  estimate  of  9  is  given  by  (lemma  3.1), 

9  »  X  -  W12  W22Z  (^*5) 

The  W22  matrix  is  positive  definite  for  any  value  of  m  (finite  or  infinite) 
C27][33].  More  precisely,  it  can  be  shown  that 


^min 

^max 


where  \  .  (W..)  and  \  (W-_)  are  the  smallest  and  largest  eigenvalues  of  the 

min  zz  max  zz 

matrix  ,  respectively.  The  equalities  in  both  (4.6)  and  (4.7)  hold  in 
the  limit  as  m  ♦  -  [33,  Appendix  F].  Due  to  assumption  A1  we  have 

W22  >  0  for  all  m.  Note,  however,  that  if  C(z)  has  zeroes  near  the  unit 
circle  then  the  condition  number  of  W  will  be  large  for  large  va.jss  of 

A 

m.  A  similar  situation  will  occur  for  W22  •  Some  numerical  problems  may 
arise  in  such  a  case  in  the  implementation  of  the  estimator  defined  by  (4.5). 


The  algorithm  for  determining  a  large-sample  ML  estimate  of  ^(z)  based 
on  (4.5)  can  be  summarized  as  follows: 


step  1.  Compute  the  sample  covariances  (4.1),  and  the  Initial  estimate 

a  (4.2). 

Step  2  Use  {r,^},  a  In  (2.5)  to  obtain  Initial  estimates  {b.  }  and  Insert 
them  In  (4.3)  and  (4.4)  to  compute  x,  z,  Wj2  '^22  '  C®"PUte  Improved 
estimates  {r|^,k=0,  ....  na+nc}  the  covariances  by  using  (4.5). 

A 

Step  3.  Use  {rj^,  k=0,  ....  na+nc}  In  (2.4)  with  k=nc+l, . . . ,nc+na,  to  obtain 
an  Improved  estimate  a  of  the  AR  parameters.  Then  use  a  and 
{rj^,  k=0,  ....  na+nc}  in  (2.6)  to  obtain  the  estimate  ^(z)  of  the  ARMA 
spectral  density. 

The  calculations  In  steps  2  and  3  of  the  above  algorithm  can  be  repeated 

A  A 

using  the  Improved  estimates  {r^.}  and  {a^}  obtained  In  step  3.  For  large  H 
this  will  have  only  a  slight  effect  the  estimates.  However,  In  the  small 
and  medium  sample  cases  the  Iteration  of  steps  2  and  3  may  have  a  beneficial 
effect  on  estimation  accuracy. 

The  computational  aspects  related  to  the  algorithm  above  are  discussed  in 
detail  In  [32].  Here  we  note  only  that  the  facts  that  VI22  *  handed 
positive  definite  matrix  and  that  non-zero  elements  can  be 

exploited  to  get  a  computationally  efficient  algorithm  (requiring  proportional 
to  m  arithmetic  operations)  for  Implementing  steps  2-3. 

Some  general  accuracy  properties  of  the  estimates  of  the  type  given  In 
equation  (4.5)  have  been  derived  In  section  3.  Analogous  properties  clearly 

A  A 

hold  for  the  estimates  a  and  <^(z)  obtained  by  the  algorithm  above.  A  more 

A  A 

detailed  accuracy  analysis  of  e  and  a  will  be  presented  later. 

We  conclude  this  section  by  noting  that  Walker  [11],  who  used  a  somewhat 
more  cumbersome  ML  approach  [33,  Appendix  E]  arrived  at  estimates  of  the 
correlations  •••»  ’’c}  and  of  the  AR  parameters  {a.}  that  are 

similar  to  ours.  Since  Walker  considered  the  estimation  of  (r, /r.)  instead 
of  our  estimates  and  those  of  Walker  cannot  be  easily  compared. 


k  >  nc-na+j 


(4.4f) 


This  implies  that  CW^2^jk  *  ^  ^  ^  ^22  *  bended 

Toeplitz  matrix  with  the  band  width  equal  to  2nc  +  1. 


In  (4.4d)  and  (4.4e)  we  have  indicated  simple  ways  for  evaluating  the 
covariance  matrices  W^2  4'’^  ^22*  Note  that  only  these  two  matrices  are  of 
interest  in  calculating  the  estimate,  cf.  (3.7).  The  matrices 
M^2  3nd  W22  depend  only  on  {a^,  i»l,  ....  na}  and 
{r^^,  k=0,...,na+nc}  .Thus,  consistent  estimates  of  and  can  be 
obtained  by  using  in  (4.4)  the  consistent  estimates  of  {a^}  and  r 

by  (4.1)  and  (4.2). 

It  follows  from  the  discussion  above  that  X  (4.3),  satisfies  the  oas- 
conditions  used  to  develop  the  approximate  ML  approach  of  section  3. 
large-sample  approximation  of  the  ML  estimate  of  e  is  given  by  (lemma  3.. 


0»x.W^2«22^ 

The  W22  matrix  is  positive  definite  for  any  value  of  m  (finite  or  infinite) 
C27]C33h  More  precisely,  it  can  be  shown  that 

A^in(W22^  ^  (4.6) 


,4  _ i,.#.'(u\|4 


(Vl„)  <  \  suplc(e  “) I 


'max' "22 


(4.7) 


where  \  (W.-)  and  x  (W__)  are  the  smallest  and  largest  eigenvalues  of  the 

min  22  max  22 

matrix  W22  1  respectively.  The  equalities  in  both  (4.6)  and  (4.7)  hold  in 
the  limit  as  m  •  [33,  Appendix  F].  Due  to  assumption  A1  we  have 
W22  >  0  for  all  m.  Mote,  however,  that  if  C(z)  has  zeroes  near  the  unit 
circle  then  the  condition  number  of  W  will  be  large  for  large  values  of 

wC  A 

m.  A  similar  situation  will  occur  for  ^^2  *  numerical  problems  may 

arise  in  such  a  case  in  the  implementation  of  the  estimator  defined  by  (4.5). 

The  algorithm  for  determining  a  large-sample  ML  estimate  of  4)(z)  based 
on  (4.5)  can  be  summarized  as  follows: 


T>  ^ 


step  1.  Compute  the  sample  covariances  (4.1),  and  the  Initial  estimate 

a  (4.2). 

Step  2  Use  a  in  (2.5)  to  obtain  initial  estimates  {b.  }  and  insert 

them  in  (4.3)  and  (4.4)  to  compute  x,  z,  Wj2  '^22  *  improved 

estimates  {rj^.k^O,  ....  na+nc}  of  covariances  by  using  (4.5). 

A 

Step  3.  Use  {rj^,  k=0,  ...,  na+nc}  in  (2.4)  with  k=nc+l, . . . ,nc+na,  to  obtain 
an  improved  estimate  a  of  the  AR  parameters.  Then  use  a  and 
{rj^,  k=0,  ....  na+nc}  in  (2.6)  to  obtain  the  estimate  ^(z)  of  the  ARMA 
spectral  density. 

The  calculations  in  steps  2  and  3  of  the  above  algorithm  can  be  repeated 

A  A 

using  the  improved  estimates  {r^}  and  {a^-}  obtained  in  step  3.  For  large  N 
this  will  have  only  a  slight  effect  on  the  estimates.  However,  in  the  small 
and  medium  sample  cases  the  iteration  of  steps  2  and  3  may  have  a  beneficial 
effect  on  estimation  accuracy. 

The  computational  aspects  related  to  the  algorithm  above  are  discussed  in 
detail  in  [32].  Here  we  note  only  that  the  facts  that  is  a  handed 
positive  definite  matrix  and  that  *^3S  few  non-zero  elements  can  be 
exploited  to  get  a  computationally  efficient  algorithm  (requiring  proportional 
to  m  arithmetic  operations)  for  implementing  steps  2-3. 

Some  general  accuracy  properties  of  the  estimates  of  the  type  given  in 
equation  (4.5)  have  been  derived  in  section  3.  Analogous  properties  clearly 
hold  for  the  estimates  a  and  <^(z)  obtained  by  the  algorithm  above.  A  more 
detailed  accuracy  analysis  of  9  and  a  will  be  presented  later. 

We  conclude  this  section  by  noting  that  Walker  [11],  who  used  a  somewhat 

more  cumbersome  ML  approach  [33,  Appendix  E]  arrived  at  estimates  of  the 

correlations  •••»  '’c}  and  of  the  AR  parameters  {a.}  that  are 

similar  to  ours.  Since  Walker  considered  the  estimation  of  (r, /r„l  instead 

^  k  0' 

of  oiir  estimates  and  those  of  Walker  cannot  be  easily  compared. 


5. 


ASYMPTOTIC  ACCURACY  PROPERTIES 


I 


1 


■  'j 


T- 

.i'. 


■2 


!? 


*1 


A 
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In  this  section  we  derive  explicit  expressions  for  the  asymptotic 
covariance  matrices  of  9  and  a  .  The  asymptotic  properties  of  ^(z)  can  be 
analyzed  similarly  (pointwise). 


It  follows  from  the  general  analysis  in  section  3  that  as  N  -*•  ®  the 
covariance  matrix  of  the  normalized  estimation  error  /TTCa  -  a),  is  given  by 


'’m  “  '^11  '  'h2'*Z2  *^12  ’ 


(5.1) 


where  the  matrices  are  defined  by  (4.4).  Furthermore,  according  to 
Theorem  3.1  we  have 


P^  >  p1  , 

m  ’ 

m 


for  m  >  m  . 


:5.2) 


A  consequence  of  (5.2)  is  that  the  sequence  of  positive  definite  matrices 


A  a 

{P  }  has  a  limit  when  m  ,  which  we  denote  P  .  An  exolicit  expression 

^  m-'  -  » 

for  p’  is  given  in  the  following  lemma. 

00 


Lemma  5.1  Consider  the  covariance  matrix  defined  by  (5.1),  (4.4).  Then 
'  m 


P,  =  -  -T  a 


(5.3a) 


where  ,1  is  a  ne  x  2nc  matrix  whose  (i ,j  )-el ement  is  given  by 


S7,.  *  E{C^(q"M[e{t-i)  +  e(t+i)]  g  — v—  e(t-nc-na-j )}  , 

A‘^(q"  ) 


(5.3b) 


i  =0,...,ne-l,  j  =  l,...,2nc 


Proof ; 


Let 


j=0 


^  '  C^(z) 


(ho  =  1) 


(5.4) 


.-1  ... 


It  is  shown  in  [10]  that  for  m  ♦  »  the  (i ,j ) -el ement  of  given  by 


16 


>  ^  jr>-.  ^  M  ^  jT^JL 


a 


■A 


i 


3J 


'M 


I 


..'i 


.  \ 

A 


ij  >1  . 


(5.5) 


1  im  W22  =  ^  . 

in-H*  \ 

where  1)  is  the  following  infinite-dimensional  matrix 


Since 


(5.6a) 


(5.6b) 


the  assertion  of  the  lemma  readily  follows  from  (4.4e)  and  (5.1).  Mote  that 
the  expression  in  (5.3b)  becomes  zero  if  j  >  2nc  . 

In  practice  m  cannot  be  too  large.  The  computational  burden  increases  in 
proportion  to  m.  Also,  m  must  be  only  a  fraction  of  the  sample  size  M  for 
statistical  "stability".  The  rate  of  convergence  of  to  its  limiting  lower 
bound  p®  given  by  (5.3)  is,  therefore,  of  interest.  Due  to  the  particular 

00 

structure  of  W  (4.4f),  the  rate  of  convergence  of  p®  to  p®  is  determined 

12  *  _i 

essentially  by  the  rate  at  which  the  left- top  2ncx2nc-bl ock  of  converges 
as  m  ®  .  The  rate  of  convergence  of  the  entries  in  that  block  depends 
strongly  on  the  location  of  the  zeros  of  C(z),  see  (5.4)-(5.6).  The  closer 
these  zeros  are  to  the  unit  circle,  the  slower  is  the  convergence  rate.  The 
parameters  {a^. }  have  a  much  smaller  influence  on  the  convergence  rate  of 
p^  ,  via  the  elements  of  the  non-zero  block  of  W  . 


17 


We  conclude  from  the  discussion  above  that  in  the  case  C(z)  has  zeros 

well  inside  the  unit  circle  we  can  get  resonably  close  to  the  lower  bound 
0 

P  for  relatively  small  values  of  m.  If  C(z)  has  zeros  close  to  the  unit 
circle  we  may  need  to  consider  much  larger  values  of  m.  This  will  be  possible 

only  if  we  have  a  long  sample  at  hand;  otherwise  we  cannot  attain  the  maximum 

0 

accuracy  corresponding  to  P  .  Recall  also  that  for  m  large  and  C(z)  with 

“  A 

zeros  close  to  the  unit  circle,  the  W22  ™trix  is  likely  to  be  ill- 
conditioned. 


Next  we  turn  to  the  calculation  of  the  asymptotic  covariance  matrix  of 
a  .  We  introduce  the  vector 


*  ^V+l’***’'’nc+na^  ’ 


(5.7) 


A  ^ 

and  the  matrix  R  as  defined  by  (2.7)  with  {r^}  replaced  by  {r^} 
estimate  a  can  be  written  as  a  function  of  9  as  follows 


A  ^ 

a  »  -R  r 


(5.8) 


We  can  now  state  the  following  result. 


Lepwia  5.2:  Consider  the  estimate  a  ,  (5.8),  where  {r^}  are  given  by 
(4.5).  Let  p*  denote  the  asymptotic  covariance  matrix  of  /iTCa-a)  •  Then 

"m  ■  "‘‘(Ou  -  Ou  4  .  (5.9) 

where  W22  is  given  by  {4.4d),  R  by  (2.7),  and 

[Qlllij  =  E{A(q'Mv(t-i)  A(q’hv(t-j)}  ,  i,j=l,...,na  ,  (5.10) 

[Ql2]ii  =  E(A^(q”h  v(t-i)  A(q"h  v(t-na-j)}  ,  i=l,...,na  , 

j=l,...,m-n0  . 


Furthermore, 


Finally,  the  limit  covariance  matrix  =  lim  P^  exists  and  is  given  by 

*  _  m 

m-M» 

p^  =  • 


where 


*  E{C^(q“h  e{t-i)  — L— e{t-na-j)}  ,  i=l . .  na  ,  (5.12) 

A(q"M  j  =  l, .  2nc. 


Proof: 

See  Appendix  B. 

As  stated  earlier,  our  estimate  a  may  differ  from  tolker's  estimate  for 
finite  samples.  A  careful  comparison  of  P^  with  the  expression  given  by 
Walker  for  the  covariance  matrix  of  the  estimate  in  [11]  shows  that  they  are 
identical.  Thus,  the  two  estimates  have  the  same  asymptotic  accuracy. 

Mote  that  the  first  term  in  (5.9),  is  the  covariance  matrix 

corresponding  to  the  standard  Yule-Walker  estimate  of  a  (i.e.,  a  obtained 
from  (4.2)  for  K  »  na+nc),  see  [23].  Thus,  the  second  term  in  (5.9)  shows  the 

A 

improvement  in  asymptotic  accuracy  that  results  by  using  {rj^}  instead  of 
{rj^}  in  the  basic  Yule- Walker  equations. 

In  the  n^xt  section  we  compare  the  asymptotic  accuracy  of  our  estimates 
to  the  Cramer-Rao  lower  bound.  In  the  course  of  the  analysis  we  also  obtain 

A 

an  interesting  result  relating  the  covariance  matrix  of  a  ,  to  the  covariance 
matrix  of  the  optimal  Yule-Walker  estimate  recently  proposed  in  [23][24]. 


S.  COMPARISON  WITH  THE  CRAMER-RAO  LOWER  BOUND 
The  following  conjecture  was  introduced  oreviously; 
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where  P®  1s  given  by  (5.3)  and  P^-  Is  the  CRLB  for  the  covariance  matrix  of 
any  consistent  estimator  of  9  .  In  the  sequel  we  present  a  proof  of  (6.1) 
for  the  general  ARMA  case,  using  some  results  presented  In  [34],  [35].  The 
asymptotic  (for  N,  m  ■►  «)  efficiency  of  the  approximate  ML  estimators  of  the 
type  considered  here  was  conjectured  by  Walker  [11]  and  later  by  others 
[16][28],  but  no  proof  was  provided,  except  In  some  special  cases 
[9].[37].Expl icit  expressions  for  P®j^  are  known  [34][36].  However,  a  direct 
algebraic  proof  of  the  equivalence  between  p®  In  (5.3)  and  p®  appears  to  be 
difficult.  Instead  we  consider  the  following  result  Introduced  In  [35]. 

Theorem  6.1 

A 

Let  9  be  the  following  estimate  of  the  ARMA  parameter  vector  9  (2.8), 

A 

9  =  argmin  V(9), 

9 

V(9)  =  « 

where 

[  ^  1 1m  N  covjn}  .  (6. 2d) 

N-w 

and  where  m  >  na+nc+1  .  Define 


(6.2a) 

(6.2b) 


P^  ^  11m  N  cov{9} 


(6.3) 


Then,  under  the  Gaussian  hypothesis 


11m  P_  =  P 


,3 


m-* 


m  CR 


(6.4) 


Proof:  See  [35]. 


It  Is  also  shown  In  [35]  that  P  Is  an  asvmptotic  lower  bound  on  the 

m 

covariance  matrix  of  any  estimator  based  on  the  m  sample  covariances 


Next  we  state  and  prove  the  following  lemma. 


Lemma  6.1: 

A 

The  estimate  e  as  defined  by  (4.5)  and  the  estimate  9  defined  by  (6.2a) 
are  asymptotically  equivalent. 


Proof:  Let 


z  *  Cz, 


z 

m-na-nc 


where  are  defined  by  (A. 16).  Also  define 


X 


(6.5a) 


where  x  is  given  by  (4.3a).  Then 


X 


(6.5c) 


where  X,  n  defined  in  (4.4b)  and  (6.2c),  and  *  denotes  entries  whose 
values  are  not  important  for  this  proof.  Since  the  matrix  Y  is  nonsingular 
and  non-random  it  follows  that 


V(9)  ^  =  [X  -  X)V^(X-X) 


(6.6) 


where  W  *  1  im  N  cov(X-X)  is  defined  in  (4.4)  (see  also  Appendix  A).  Note 
that  we  denote  both  the  true  and  the  unknown  parameter  vectors  by  the  true 
symbol  9  .  The  equality  (6.6)  holds  for  all  admissible  values  of  9  .  Thus 
we  have 


2||-W'^(X-Xj  -  2CI  0]  W'^(X-X) 


—  ’-1  — 

(*-X)  I?-  (^-X) 


(6.7) 


A  A 

A  A 

next  note  that  8  is  a  root  M  consistent  estimate:  e-e  *  o(l//Nj  (see 


(6.3)).  Therefore 


[X-Xjl:  =  o(l//N] 

’a 

Furthermore,  it  follows  exactly  as  in  (A.l),  (A. 2)  that 

#1-  =  [»  #1  j  1  ■  “(IX-'N) 

'  9  9 

which  implies  that 

A  A 

XI:  •  x'l.  .i||;  (5  -  e)  ►  o(l/>i)  =  x  ♦  o(|) 

'9  9  '9 


(6.8) 


(6.9) 


(6.10) 


where  9  corresponds  to^  {a.}  (note  that  X  -  X|“  ,  see  (4.3)).  From  (6.7)- 

A  •  9 

(6.9)  it  follows  that  s  satisfies  the  following  equation 


[I  OlU'Vx  -  (S  ])  ♦  0{i)  .  0 


(6.11) 


Since  9  is  the  approximate  (of  order  1/N)  solution  of  an  equation  with 

A  A 

identical  dominant  term  (see  (3.4)),  we  conclude  that  9  -  9  *  o(l/N]  ,  and 
the  proof  is  finished. 

9  "" 

From  Lemma  6.1  we  conclude  that  P  =  .  Thus,  0  is  a  minimum 

m  m 

variance  estimator  in  the  class  of  estimators  based  on  m  sample  covariances 
{r^ , . . . ,r^_j^}  .  Furthermore,  from  theorem  6.1  it  follows  that 


1 im  P^  *  P® 

„  m  CR 
m+«o 


(6.12) 


Thus,  9  is  an  asymptotically  (for  m,  m  ♦  oo  )  efficient  estimator.  An 


immediate  consequence  of  this  fact  is  tiiat  both  a  and  ^(z)  (are 
asymptotically  efficient  estimators.  Of  less  importance  is  the  fact  that  the 

A 

above  results  provide  another  way  for  introducing  the  estimator  9  (as  a 

A 

large-sample  approximation  of  9  ). 

In  the  remaining  part  of  this  section  we  show  that  p*  is  equal  to  the 
asymptotic  covariance  of  the  optimal  YW  estimator  of  a,  introduced  in  [23]. 
Since  the  optimal  YW  estimator  is  asymptotically  efficient  [23],  this  equality 
provides  an  alternative  proof  of  the  asymptotic  efficiency  of  a  .  The 
equivalence  between  the  covariance  matrices  of  these  two  estimators  is  also 
interesting  in  its  own  right. 

Let  us  introduce  the  matrices  Rj^  (k  x  na)  and  S|^(k  x  k),  for  k  >  na  , 


[Rklij  *  E{y(t-nc-i)y(t-j)} 
t^Jij  "  X^E(C(q'^)y(t-i)C(q'hy(t-j)} 


i“l,...,k  , 

j -1 ,...,na  , 

i,j  *  l,...,k  , 


(6.13) 


and  defined  as 


(6.14) 


The  inverse  matrix  in  (6.14)  exists  for  any  k  >  na  [23]. 


The  following  result  relating  P*  to  Pj^  (for  a  certain  k)  is  essential 


in  proving  that  P“j^  =  P 


Theorem  6.2:  Consider  the  covariance  matrices  P_  and  P,.  defined  by  (5.9)- 
(5.10)  and  (6.13)-(6.14) ,  respectively.  Let  m  >  na  +  nc  +  1  .  Then 


P  =  P 
m  m-nc-1 


(6.15) 


Proof:  See  Appendix  C. 


Note  that  P|^  (k  >  na)  is  the  asymptotic  covariance  matrix  of  the 


optimally  weighted  overdetermined  Yule-Walker  estimator  (OWOYWE)  (or  the 
asymptotically  equivalent  optimal  IV  estimate)  recently  introduced  in  [231. 

A 

Thus,  a  given  by  (5.8)  and  the  OWOYWE  of  [23]  based  on  m-nc-1  instruments 
have  the  same  accuracy,  as  N  .  These  two  estimates  seem,  in  fact,  to  be 
asymptotically  identical;  however,  in  the  finite-sample  case  they  will  in 
general  have  different  values. 


The  reason  for  the  usefulness  of  equality  (6.15)  is  that  the  convergence 
of  P|^  as  k  •  was  studied  in  detail  in  [23].  In  particular  it  was  shown 
there  that  under  the  Gaussian  hypothesis 


\  -  '’CR 


as  k 


(6.16) 


An  explicit  expression  for  was  also  given  in  [23].  The  “rate  of 
convergence”  of  p  to  p^  was  also  studied  in  [23]  by  means  of  some 
numerical  examples  as  well  as  analytical  calculations.  The  results  reported 
there  on  the  convergence  rate  in  (6.16)  support  the  statements  we  already  made 
in  section  5:  (i)  the  C-parameters  have  a  much  stronger  influence  on  the 

convergence  rate  than  the  A-parameters;  (ii)  the  convergence  is  slow  when 
C(2)  has  zeros  close  to  the  unit  circle. 

7.  CONCLUSIONS 

We  developed  a  technique  for  estimating  the  spectral  parameters  of  an 
ARMA  process  from  a  set  of  sample  covariances.  The  proposed  algorithm 
provides  consistent  parameter  estimates.  Explicit  expressions  were  derived 
for  the  asymptotic  covariances  of  the  parameter  estimates.  It  was  shown  that 
the  estimates  of  the  ARMA  parameters  obtained  by  this  technique  are 
asymptotically  efficient. 

The  computational  requirements  of  the  proposed  technique  are  of  the  same 
order  as  those  of  the  modified  Yule-Walker  estimator.  A  more  detailed 
discussion  of  the  computational  and  implementation  aspects  of  this  algorithm 
and  a  numerical  performance  evaluation  will  be  presented  in  [32]. 


wmmmm. 
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APPENDIX  A:  PROOF  OF  EQUATION  (4.4) 


Let  us  consider  the  following  Taylor  series  expansion  of  z^^ 
(viewed  as  a  function  of  a  )f  around  a: 


where 


na  3Zt(a), 

I  - 

s»l  33^  'a-a 


(a^-a^)  +  0(1/N)  , 


and 


na  na 

I  I  *1*j^nc+na+k-1-j  ’ 
1»0  j»0 


3z. (a)  na 


k  >  1  , 


.  k  >  1  , 

1  <  s  <  na  . 


(4.3b) 


(A. la) 


(A. lb) 


(A.lc) 


According  to  the  Yule-Walker  equations  (2.4)  the  derivative  (A.lc)  is 
0(1//NJ  .  It  then  follows  from  (A. la)  that 
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z^(a)  =  z^+  0(1/N)  .  (A. 2) 

Thus,  the  random  variables  Zj^  and  Zj^  have  the  same  asymptotic  behavior,  and 
in  the  following  calculations  we  will  consider  Zj^  instead  of  Z|^. 

Under  the  assumptions  imposed  on  the  ARMA  process  (2.1)  it  is  well  known 
that  for  any  finite  k 


where 


dist 

N 


>c/K(0,  V)  , 


(A. 3a) 


(r  r  ,  r  r  .  . ) 

T  T+j-i  T  T+i+J 


(A. 3b) 


see  [14], [15].  Since  {x^}  ,  (4.3b),  and  ,  (A. lb),  are  linear 
combinations  of  {rj},  the  convergence  in  distribution  of  X,  (4.3),  to  a 
Gaussian  distribution  follows  from  (A. 3).  It  remains  to  verify  the  expression 
of  the  covariance  matrix  of  the  limiting  distribution,  given  in  (4.4).  Note 
that  formulae  analogous  to  (4.4)  have  been  given,  without  any  proof,  in  [11]. 
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Proof  of  (4.4c) 
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Note  from  (A. 3)  that 
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(A. 5) 
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Thus,  is  equal  to  the  covariance  at  lag  k  of  the  process 

v(t)  =  X  4^^e{t)  , 
and  the  proof  of  (4.4cl  is  concluded. 


(A. 7) 


Note  that  to  compute  the  estinate  (4.5)  we  do  not  need  to  consistently 

estimate  .  However,  a  need  for  calculating  could  arise  if  we  want 

to  compute  the  covariance  matrix  p®  ,  (5.11,  or  its  limit  as  m  -►  «  , 

m 

(5.3).  To  evaluate  the  entries  of  we  cannot  proceed  by  "long  division" 
as  we  did  for  W,.  and  W..  ,  see  (4.4).  The  reason  is  that  the  coefficient 

o  12  22  o 

of  2^  in  the  infinite  division  of  j,  (z)  cannot  be  computed  without 
truncation  errors.  Instead,  we  can  calculate  as 


C^(z)C^(z‘M 

a2{z)a2{z‘M 


T7^  r^zTiiP^  T  • 


(A. 8) 


by  using  an  exact  algorithm  for  evaluating  complex  integrals  given  in  [26]. 


Proof  of  (4.4d) 


We  have  that,  cf.  (A. 2)  and  (A. 5) 
na  na  na  na 

’  klo  lo  lo  i-  W.i.k.p 

"'"na+nc+i-k-p-' ^’^na+nc+j -z-s  ~  “^na+nc+j -t-s^^ 


na  na  na  na 


I  y  yya.aaaCo...  + 

k=0  p=0  i=0  s=0  ^  P  ^  5  J-i-k*P-4-s 


29 


A- 


I 


f 


•i  M 

Ub 


>0 

.V 


+  *  ]  . 
2na+2nc+i+j-ic-p-4-s 


(A. 9) 


Let  us  denote  the  two  terms  in  (A. 9)  by  and  .  According  to  the 


interpretation  (A. 7)  of  we  can  write 


na 
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and 
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'tie  thus  get 
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CW22]ij  =  E{A  (q  )  v{t-i)A‘{q  ")  v(t-j)}  , 


(A. 12) 


which  is  the  expression  given  in  (4.4d).  To  complete  the  proof  of  {4.4d)  we 


note  that  ^  written  as 
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Proof  of  {4.4e) 


It  follows  from  (A. 2),  (A. 5)  and  (A. 7)  that 
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+  v(t+i-k-p)v(t-na-nc-j )] }  = 

=  E[A^(q  M[v{t-i)+v(t+i)]v(t-na-nc-3)|  , 
which  can  also  be  written  as 
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APPENDIX  B:  PROOF  OF  LEMMA  5.2 


Let 


Q  ^  3a(8  ) 


36 


6=9 


Then 
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ni  m 


where  p^  is  given  by  (5.1).  Some  straightforward  calculations  give 
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Thus,  D  =  -R  G  ,  where 


G  ^  4  (R  a  +  r} 
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9=9 


The  (j  .k)  element  of  G  is  given  by 
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na 


L=0  ,k  "  V+j+k'^^nc+j-k 


.  k  ^  0  , 


na 


(B.5 


=  a  .,  k*0, 

1=0  ^  nc+j-1,0  nc+j 

for  j=l,...,na,  k  =  0,...,na+nc. 


In  (B.5)  we  have  set  a,  =  0  for  k  >  na  and  k  <  0,  and  a-.  =  1  . 

k  0 

Next  we  evaluate  the  matrix  products  GWj^2  which  appear  in 

(B.2).  The  (i.j)  element  of  GW^2  ’s  given  by,  cf.  (4.4e), 
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,  i— l,...,na  ,  j— 1,»«., m— n 0  , 


where  cf.  (B.5), 
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It  follows  that 
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The  matrix  GW, .G  could  be  evaluated  by  similar  calculations.  However,  it  is 
more  convenient  to  note  that  GWj^j^G  the  covariance  matrix  of  Ra  +  r  •  In 
effect  the  following  equality  holds: 


Ra  +  r  =  G{9-6)  • 


(B.8) 


The  (i,j)  element  of  GW^^G  is,  therefore,  given  by,  cf.  (4.4c)  and  (8.8), 
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Denote  the  two  terms  in  (B.9)  by  T^  and  T^^  . 
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=  E{v(t-j)A^(q"Mv(t+2nc+i)}  =  E{v(t-j  ).;^C^(q‘^)e(t+2nc+i )}  »  0  . 


Thus,  we  have  shown  that 

=  E{A(q’hv(t-i)A(q*Mv(t-j )}  ,  i,j*l . na  .  (B.ll) 

The  expression  (5.9),  (5.10)  of  p®  now  readily  follows  from  (5.1),  (B.2), 
(B.3),  (B.7)  and  (B.ll).  The  inequality  (5.11)  is  a  simple  consequence  of 
(5.2)  and  (B.2).  Finally,  the  expression  of  P*  in  (5.12)  follows  from  (5.3), 
(B.2),  (B.3),  (B.ll),  the  relation  r  ’  Gij  and  some  calculations  similar  to 
(B.6)  -  (B.7). 


APPENDIX  C:  PROOF  OF  THEOREM  6.1 


Let  H  denote  the  following  nonsingular  (m-nc-l)  x  (m-nc-1)  matrix 
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Recall  that  ne  =  na  +  nc  +  1  .  For  1  <  k  <  m  -  ne  and  1  <;  j  <  na  we  have 
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It  follows  that 
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with  R  =  defined  by  (2.7). 
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Next  we  introduce  the  reciprocal  polynomial  of  A(2) 
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Next  note  that 
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APPENDIX  D;  A  PARAMETRIZATION  OF  THE  SPECTRAL  DENSITY 
FUNCTION  OF  AN  ARMA  PROCESS 


From  the  Yule-Walker  equations,  (2.4)  it  follows  that  (  =  1) 
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In  establishing  the  last  equality  in  (0.1)  we  tacitly  assumed  that 


nc  >  na  . 


(D.2) 


When  (0.2)  does  not  hold,  the  derivation  needs  to  be  modified  and  the 
following  expressions  become  more  complicated.  Let 
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From  (D.l)  it  follows  that 
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we  get 


nc 


;(z) 


J.Pk^ 
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A(2"M 


(0.7) 


The  spectral  density  <i(z)  is,  therefore,  completely  determined  by 


9* 


•  •  •  ♦ 


(D.8) 


cf.  (0.4)-(o.7).  The  parametrization  via  (0.8)  of  an  ARMA  process  was  used  by 
Walker  [113.  Cadzow  [12]  presented  the  explicit  dependence  of  ^[z)  on  (D.8) 
as  in  (D.4),  (D.6)  and  (D.7);  see  also  [13].  Cadzow's  derivation  of  (D.6), 
(D.7),  however,  provides  less  insight  into  the  problem  than  the  derivation 
above.  Unfortunately,  (0.6)  and  (D.7)  rely  on  the  assumption  (D.2).  If  this 
assumption  is  not  valid  then  $(z)  will  have  a  more  complicated  expression 
than  (0.6)-(0.7). 


APPENDIX  E:  AN  EXTENDED  MAXIMUM  LIKELIHOOD  ESTIMATION  PROBLEM 
AND  ITS  LARGE-SAMPLE  SOLUTION 

In  this  appendix  we  present  a  generalized  version  of  the  ML  estimation 
problem  introduced  in  section  3.  The  large-sample  solution  of  the  generalized 
fiL  problem  can  be  obtained  in  a  similar  manner.  The  results  of  this  appendix 
cover  Walker's  approach  [11].  Even  though  these  results  are  not  used  directly 
in  the  paper,  we  believe  that  they  are  useful  in  deriving  new  estimators  in 
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some  specific  estimation  problems. 


Consider  a  random  m-vector  X  which  is  asymptotically  normally  distributed 
such  that  for  some  X  to  be  specified 

/Tr(X-X)  — - ►./K(0,  W)  ,  (E.l) 

where  N  denotes  the  length  of  the  sample  used  to  construct  X.  Let  3  be  the 
parameter  vector  to  be  estimated.  Assume  that  3  completely  determines  the 
asymptotic  distribution  (E.l)  of  X.  In  contrast  to  the  treatment  in  section  3 
we  now  allow  X  to  depend  on  e  .  However,  we  impose  some  restrictions  on  this 
dependence.  Thus,  let  X  be  partitioned  as 


X 


ne  *  dim  9  . 


We  assume  that 


X 

where  x  -  x  has  the  form 


} 

0 


ne 


(E.2a) 


X  -  X  *  -Be  +  r  , 


(E.2b) 


where  B  is  a  nonsingular  (at  least  for  !>!-►«)  matrix,  and  where  B  and  r 

A 

depend  on  the  data  only.  Furthermore,  we  assume  that  there  exists  z  such 
that 


|z  -  zl=  OCl/N)  , 


(E.3a) 


and  W  such  that 

Iw  -  W|  =  0(1//F)  .  (E.3b) 

where  both  z  and  w  depend  only  on  the  data  at  hand. 


A 

Note  that  assumption  (E.3b)  Is  fairly  weak.  The  matrix  w  may  be  taken 
as  W(9)  with  9  a  consistent  estimate  of  9  .  Similarly,  (E.3a)  will  be 

A 

satisfied  by  taking  z  »  z{9)  provided  that 

||^  z(9)|  *  0(1//Tr)  .  (E.3c) 

This  can  be  seen  from  the  following  Taylor  series  expansion 

Z(9)  =  Z(9)  +  [|^  z{9)](9-9)  +  0(1/N)  =  Z(9)  +  0(1/N)  ,  (E.4) 

where  the  second  equality  follows  from  (E.3c).  Satisfying  the  conditions 
(E.2)  and  (E.3c)  In  a  given  application  requires  careful  choice  of  X  (for  a 
specific  example  see  [11]). 

Under  the  conditions  above  we  derive  a  simple  large-sample  approximation 
of  the  ML  estimate  of  9  ,  In  the  manner  of  section  3.  The  asymptotic  log- 
likelihood  function  of  X  Is  given  by  (3.2).  Paralleling  the  analysis  In 
section  3  we  obtain  an  asymptotically  valid  approximation  of  the  derivative 
with  respect  to  9  of  the  log-likelihood  function: 

^4?^“  [B,0(1/NJ]W’^{X-X)  +  0(1/N)  = 


which  gives 


(E.8) 


9  a  B"^{r-W^2M22  ’ 

Concerning  the  asymptotic  accuracy  properties  of  9  ,  (E.8),  we  can  prove 
results  analogous  to  those  of  section  3.  To  save  space  we  shall  omit  the 
detail s. 


APPENDIX  F:  THE  NONSINGULARITY  OF  and  ' 


In  this  appendix  we  analyze  a  condition  which  was  tacitly  assumed  to 

hold.  It  was  assumed  that  the  inverse  (m-ne)  x  (m-ne)  matrices 

w"!  exist.  Since  we  let  m  tend  to  infinity  this  assumption  should  be 
22 

analyzed  with  some  care.  Indeed,  some  eigenvalues  of  these  matrices  might 
tend  to  zero  as  m  ♦  «  and  then,  even  though  the  matrices  are  nonsingular  for 
any  finite  m,  they  may  be  ill-conditioned  for  large  m.  To  address  these 
issues  we  state  the  following  result. 


Lemma  F.l.  Consider  the  mxm  matrix  given  by  {4.4d).  Let 
{Aj  denote  the  eigenvalues  of  and  let 


inf{Xj}  ,  =  sup{Xj}  . 

J  0 


(m)  (m+1)  (m)  (m+1) 

^min  ^  ^min  ’  ^max  **  ^max 


(F.l) 


(F.2) 


*  1  ^niin  “  ^  ^  ®  '  * 

m->«»  u) 


(F.3a) 


=  ‘  ’“'I'''  'I 

(TJ+"  u) 


(F.3b) 


Proof:  The  Inequalities  in  (F.2)  are  direct  consequences  of  the  fact  that  as 
m  increases,  the  sequence  of  W22  matrices  is  a  sequence  of  nested  non-negative 
definite  matrices.  We  will  now  prove  (F.3a).  (The  proof  of  (F.3b)  is 


■-'•v.v.v.v. 


ii 


similar) .  Let  y  be  a  real  number,  and  consider  the  matrix  -  yl  •  The 
(k,p)  element  of  this  matrix  Is  given  by 


^  /’'  |C(e^“)l^e’“^'^‘P^da,  -  ydj^^p  = 


{F.4) 


If  y  is  such  that 


X  |C(e***)i  -y  >0  ,  for  u»  £  ("It,  it]  , 


(F.5) 


then  It  follows  from  (F.4)  that  yl  is  the  covariance  matrix  of  a  moving- 

average  process  with  a  covariance  generating  function  equal  to  the  left-hand- 
side  of  (F.5).  Thus 


^22  >  yI  ♦ 


for  all  m  . 


(F.6) 


If  (F.5)  does  not  hold,  (F.6)  cannot  be  true.  Now,  a  .  is  uniquely  defined 

mi  n 

by  the  following  two  conditions. 


>  a  .  I 
44  min 


for  all  m  . 


(F.7a) 


>  (a  .  +  £)I  ,  e  >  0,  cannot  hold  for  all  m  .  (F.7b) 

44  ml  n 

From  the  above  discussion  it  readily  follows  that  is  given  by  (F.3b). 


Since  we  assumed  that  C(z)  has  no  zeros  on  the  unit  circle  (Al)  we 
conclude  from  the  lemma  above  that  >  0  .  Thus,  W22  exists  for  any 

value  of  m  (finite  or  infinite).  However,  note  that  if  the  polynomial  C(z) 
has  zeros  close  to  the  unit  circle  then  some  numerical  problems  may  be 
expected.  Indeed  in  such  a  case  g^^^  will  be  small  and  then  W^2  be 
ill-conditioned  for  large  m,  cf.  (F.3).  Since  ^^2  ®  consistent  estimate 

oT  ^22  we  expect  that  the  discussion  above  applies  to  ^^2  well,  provided 
that  N  is  sufficiently  large. 


In  the  small-sample  case  some  additional  care  may^  be  needed.  The  matrix 
W22  ’s  obtained  from  (4.4d)  where  {b|^}  are  replaced  by  {bj^}  computed  from 
{a^}  and  {rj}  via  (2.5).  When  C(2)  has  zeros  close  to  the  unit  circle  it  may 
happen  that  the  estimated  symmetric  polynomial 

nc 

B{z)  »  I  b  -k  (F.8) 

k=-nc  ^ 

has  zeros  ^  the  unit  circle.  This  is,  for  example,  the  case  whenever  B{z)  is 
not  factorizable.  As  is  known,  the  polynomial  B(z)  will  have  in  this  case 
complex  zeros  with  odd  degree  of  nwl  tipi  icity  on  the  unit  circle. For 
W22.  (F.3a)  becomes 


■'mm 


“  ^min(^22J®(^22)  *  ^  inf|B(e^“)i 


(F.9) 


Thus,  if  (F.8)  has  zeros  on  the  unit  circle  then  we  get  from  (F.9)  that 
*^min  ■  ^  therefore,  we  expect  W22  to  be  very  ill-conditioned  for  large 

m.  To  avoid  such  cases  we  may  need  to  determine  the  zeros  of  (F.8)  and 
perform  some  correction  on  those  vrfiich  are  on,  or  too  close  to,  the  unit 
circle. 
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1.  INTRODUCTION 


The  problem  of  estimating  the  parameters  of  sinusoidal  signals  from  noisy 
data  has  received  considerable  attention  recently  [1].  The  sinusoid 
parameters  can  be  estimated  using  correlation  based  techniques.  These  include 
Prony's  method,  Pisarenko's  harmonic  decomposition  procedure,  and  the  Yule- 
Walker  method  in  one  of  its  many  versions. 

Prony's  Method  (see  [2]  for  a  recent  survey)  is  known  to  give 
inconsistent  estimates.  It  cannot  be  used  in  cases  with  a  low  signal-to-noise 
ratio  since  the  resulting  estimates  may  be  highly  biased.  In  Pisarenko's 
procedure  [2]  this  problem  is  eliminated.  This  method  gives  consistent 
estimates,  but  in  some  cases  it  has  poor  accuracy. 

The  basic  Yule-Walker  method  [1],[2]  does  not  eliminate  this  deficiency 
of  Pisarenko's  method.  It  gives  consistent  estimates,  but  its  accuracy  may  be 
poor.  Since  the  Yule-Walker  method  is  attractive  from  the  computational 
standpoint,  much  effort  has  been  spent  in  recent  years  to  improve  its  accuracy 
properties. 

The  overdetermined  or  high-order  Yule-Walker  method  is  a  modification  of 
the  basic  Yule-Walker  procedure,  which  was  reported  to  lead  to  a  considerable 
increase  in  resolution  [3], [4], [5], [6].  This  method  was  proposed 
heuristically,  and  the  properties  of  the  corresponding  estimates  were  analyzed 
by  Monte-Carlo  simulations  only.  The  reasons  for  the  increase  of  the 
parameter  estimation  accuracy  when  the  number  of  Yule-Walker  equations  and  the 
model  order  are  increased,  were  not  too  well  understood.  In  [11]  and  [12]  we 
have  tried  to  fill  this  gap.  Very  briefly,  the  conclusions  of  [11], [12]  are 
that  the  asymptotic  accuracy  of  the  Yule-Walker  estimates  will  increase  with 
the  number  of  Yule-Walker  equations  used  and  with  the  model  order,  although 
not  necessarily  monotonically.  However,  even  when  the  number  of  Yule-Walker 
equations  and  the  model  order  are  increased  without  bound,  the  limiting 
accuracy  may  still  be  worse  than  that  corresponding  to  the  Cramer-Rao  lower 
bound  (CRLB).  Thus,  in  general,  it  is  possible  to  improve  the  accuracy  of  the 
Yule-Walker  based  estimates. 
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In  this  paper  we  consider  the  following  procedure  for  estimating  the 
parameters  of  sinusoids  in  noise.  We  use  the  overdetermined  Yule-Walker  (OYW) 
method  to  get  initial  estimates  of  the  sinusoid  parameters.  These  are  then 
used  as  starting  point  in  a  Gauss-Newton  algorithm  for  maximizing  the 
likelihood  function  (under  the  assumption  that  the  measurement  noise  is 
Gaussian).  Since  the  OYW  method  provides  good  initial  estimates,  the  Gauss- 
Newton  algorithm  needs  relatively  few  iterations  to  converge.  Also,  the 
problem  of  convergence  to  local  maxima  is  not  likely  to  occur.  Furthermore, 
we  show  a  way  to  considerably  simplify  the  Gauss-Newton  algorithm.  The 
simplified  algorithm  is  also  more  stable  from  the  numerical  point  of  view. 

Yet  it  has  the  same  convergence  point  and,  at  least  asymptotically,  the  same 
convergence  rate  as  the  original  Gauss-Newton  algorithm.  We  show  by  means  of 
a  number  of  Monte-Carlo  simulations  that  the  (simplified)  maximum-likelihood 
(ML)  Gauss-Newton  algorithm  has  better  resolution  than  the  OYW  method. 
Comparisons  with  the  asymptotic  CRLB  are  also  included. 

Some  studies  related  to  the  present  paper  were  reported  in  [7]  and  [8]. 

In  [7]  an  approximate  ML  method  is  discussed.  A  relatively  simple  numerical 
algorithm  is  obtained,  at  the  cost  of  sacrificing  some  accuracy.  The  method 
proposed  here  is  of  comparable  complexity,  but  has  better  asymptotic 
accuracy.  Reference  [8]  presents  a  performance  comparison  of  several 
estimation  techniques  based  on  linear  prediction  and  on  Singular  Value 
Decomposition. 

The  outline  of  the  paper  is  as  follows.  In  section  2  we  state  the 
problem  considered  here.  Section  3  contains  a  brief  review  of  the 
overdetermined  Yule-Walker  method  for  estimating  the  sinusoid  parameters. 

This  method  is  used  to  provide  initial  estimates  for  the  proposed  maximum 
likelihood  method,  which  is  described  in  section  4.  The  asymptotic  properties 
and  some  computational  aspects  of  both  methods  (OYW  and  ML),  are  briefly 
discussed.  The  problem  of  local  minima  of  the  cost  function  being  minimized 
in  the  proposed  method,  is  discussed  in  section  5.  Numerical  examples 
illustrating  the  performance  of  the  proposed  technique  are  presented  in 
section  6. 


2.  STATEMENT  OF  THE  PROBLEM 


Consider  the  following  sinusoidal  signal 

ID 

x(t)  *  y  sin(aj. t  +  (j). )  ,  ,  (2.1a) 

i=l  ^  1  1 

where 

f  :|)^  e  R  t  s  (0,  t)  (  3nd  t  uj  for  i  ^  j  .  (2.1b) 

The  assumption  ^  0  means  that  a  possible  non-zero  constant  level  of  x(t) 
has  been  removed.  The  condition  <  ir  is  a  consequence  of  Shannon's 
sampling  theorem. 

Let  y(t)  denote  the  noise-corrupted  measurements  of  x(t) 

y(t)  =  x(t)  +  £(t)  ,  (2.2) 

where  (e(t)}  is  a  sequence  of  independent  and  identically  distributed 
Gaussian  random  variables  with  zero  mean  and  variance  \  .  We  assume  that 

x(t)  and  e(s)  are  independent  for  any  t  and  s. 

The  assumption  that  e(t)  is  Gaussian  may  appear  somewhat  restrictive. 
Under  the  Gaussian  hypothesis  it  is  easy  to  write  the  likelihood  function  of 
the  data  and  to  obtain  an  explicit  expression  for  the  CRLB.  If  in  some 
application  the  Gaussian  hypothesis  fails  to  be  true,  the  algorithm  of  this 
paper  is  still  applicable,  but  it  will  no  longer  provide  the  ML  estimates. 
Nevertheless,  the  estimates  obtained  by  using  the  algorithm  will  still  give 
the  minimum  variance  in  a  fairly  large  class  of  estimators  whose  covariance 
matrices  depend  only  on  the  second  order  statistics  of  the  data.  This  is 
explained  further  in  section  4. 

Next  we  denote  by  r^  the  covariance  of  y(t)  at  lag  n  (n=0,  1,2,...) 

r^  =  E('y(t)  y(t-n)}  .  (2.3) 
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The  operator  £{•}  denotes  statistical  expectation.  The  sample  covariances 
corresponding  to  (2.3)  shall  be  denoted  by  r^  .  We  will  use  the  following 
definition  of  r 


.  1  N-n 

*"0  *  TI-TT  ^  y( t)  y{t+n)  ,  n  *  0,1,2,... 


(2.4) 


where  N  denotes  the  length  of  the  data  sample. 


Collecting  the  amplitudes  {«.}  ,  phases  {1^.}  and  frequencies  {oj^}  in  a 
single  parameter  vector,  we  define 


’“m*  i*!  *  ‘  *  *  “1*' 


(2.5) 


The  problem  considered  in  this  paper  is  the  estimation  of  9  from  N  samples  of 
noisy  measurements  {y(l),...,y(N)}  . 
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3.  THE  INITIAL  OVERDETERMINED  YULE-WALKER  ESTIMATES 

As  is  well  Known  x(t),  (2.1),  obeys  a  homogeneous  difference  equation  of 
order  2m, 

x(t)  +  a^x(t-l)  +  ...  +  a^x(t-n)  =  0  ,  n  ^  2m  ,  (3.1) 

where  {a. }e  R  are  such  that  the  polynomial 

A(2)  *  1  +  a, 2  +  a  2^  ,  (3.2a) 

1  n 

has  all  its  2eros  located  on  the  unit  circle  at  i.e., 

A(e-^'“k)  *  0  ,  k  =  1,  ....  m  .  (3.2b) 

See  [2], [4], [5], [13].  Since  we  have 

r^  *  E(x(t)  x(t+n)}  q  ,  (3.3a) 

where  5^  .  is  the  Dirac  delta 

(  1  i*j 

6..=  ,  (3.3b) 

^  I  0  i^j 

it  follows  from  (3.1)  that  the  coefficients  {a^}  obey  the  so-called 
(modified)  Yule-Walker  equations 

A  commonly  used  technique  for  estimating  the  frequencies  {«.}  is  based  on 
(3.4).  Consistent  estimates  {a.}  can  be  obtained  by  solving  the  following 
linear  system  of  equations. 
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r  ^  »  L  >  2n  , 


(3.5) 


I  A  •  A 

IJl-I  •  •  •  •  ■'L-n 


where  {r^}  are  the  sample  covariances.  The  matrix  appearing  in  (3.5)  has 
full  rank,  at  least  for  large  M,  [14].  Mote  that  for  L  >  2n  the  system  (3.5) 
is  overdetermined  and  needs  to  be  solved  in  a  least-squares  sense. 

Intuitively  we  can  expect  that  the  larger  L,  the  more  accurate  will  be  the 
estimates  {a^. ]  ,  since  the  covariances  for  large  lags  contain  "useful 
information"  about  the  covariance  structure  of  the  data.  While  it  is  not 
always  true  that  increasing  L  Increases  estimation  accuracy  [12],  it  was  shown 
by  simulations  [3], [6],  that  increasing  L  is  often  useful.  A  theoretical 
explanation  of  this  empirically  noticed  fact  was  recently  presented  in  [12]. 

It  was  shown  there  that  while  the  asymptotic  (for  N  ■►  •  )  accuracy  of  (a^}  , 
does  not  increase  monotonically  with  L,  it  improves  considerably  in  the  Unit 


as  L  ♦  •  .  For  L  <  -  the  estimation  errors  (a^-a^)  are  of  order 


l/ZlT  ,  and  for  L  •  they  are  of  order  l/L/N  .  The  estimation  technique 
based  on  (3.5)  with  L  >  2n  is  the  so-called  overdetermi ned  Yule-Walker  (OYW) 
method  [3]-[6]. 


The  frequencies  can  now  be  estimated  by  determining  the  roots  of 


A(z)  =  1+a, z  +  ...  +  a^z"  »  0 
1  n 


(3.6) 


Mote  that  determining  the  estimates  from  (3.6)  implies,  in  general,  some 

approximations  since  A(z)  is  not  guaranteed  to  have  all  of  its  zeros  on  the 
unit  circle.  (For  example,  one  may  look  at  the  peaks  of  l/lMe-^^lj^  ,  or  at 
the  angles  of  the  roots  of  A(z)). 


The  problem  of  determining  estimates  of  {a^}  and  once  estimates 
of  the  frequencies  are  given,  can  be  reduced  to  a  least-squares  fit. 
Rewrite  (2.1),  (2.2)  as 


where 


8,^  *  a,^  COS  ‘  \  ’  (3.7b) 

Replacing  {u^}  in  (3.7)  by  their  estimates  ,  the  problem  of  estimating 

8|^,b|^}  can  be  formulated  as  the  following  minimization  problem; 


^  ^  ifc  4*  O 

min  I  (y(t)  -  S  (a^sin  u).  t  +  b,^  cos  w^tj}  ,  M  <  N  .  (3.8) 

{8^.b^}  t=l  kn  "  " 

The  solution  to  this  problem  is  given  by 


*  iff  n(t)V(t)^}“^d  I  V(t)y(t)}  ,  (3.9a) 

^  t*l  t*l 


A  ^  A  *  •  y 

Y(t)  »  [sinu^t,...,  sinojj^t,  cosw^^t,  ...,  cosoi^t]  .  (3.9b) 

The  reason  for  not  using  all  of  the  N  data  points  in  (3.8),  (3.9)  will  be 
explained  later.  It  will  be  shown  that  if  M  in  (3.8)  is  too  large  (e.g.  M=M) 
then  the  estimation  accuracy  may  deteriorate  considerably.  Note  that  for 
M  <  N  we  also  get  a  smaller  computational  burden. 

Using  {g.}  and  {b^}  in  (3.7b)  we  readily  obtain  estimates  of  {o^}  and 
as  given  by 

A  A  A 

=  arctg(bj/8j}  (mod  Zv), 

i  =  1,...,  m  .  (3.10) 


Next  we  discuss  some  implementation  Issues  related  to  (3.9).  Straightforward 
programming  of  (3.9)  would  lead  to  a  large  computational  burden.  The  main 
reason  Is  that  calculation  of  trigonometric  functions  on  a  computer  Is  time- 
consuming.  Note,  however,  that  the  solution  c^(t)  of  the  following  second- 
order  difference  equation 

c.(t)  -  (Zcosu^)  c^(t-l)  +  c^(t-2)  =  0  ,  t=3,4,... 

with  initial  conditons 


c.(l)  *  cos  u».  ,  c^.(2)  =  cos  2  u.  , 


(3.11b) 


Is  given  by 


C^.(t)  =  cos  (i».t  , 


t=l,2,... 


(3.11c) 


A  different  set  of  initial  conditions  (c.(l)  *  sinw,,  c^.(2)  =  s1n2ui^)  will 
produce  c^(t)  *  sin  u>^t  .  Thus,  the  sequences 
(slfVu^t,  cosw^t;  t*l,.,.,M;  1*1,.. ,m}  can  be  generated  using  (3.11)  at  a  cost 
of  approximately  2mM  multiplications,  and  the  vector  5:V(t)y(t)  In  (3.9)  will 
require  a  total  of  AnW  multiplications. 

Next  we  present  an  efficient  way  for  computing  the  matrix  zY(t)V(t)^  In 
(3.9).  It  follows  from  Lemma  A.l  In  the  Appendix  that 


M  M 

igj-  sin  u^t  sin  (djt  »  ^  Jcos  w^jt  -  cos  <*»|jt] 


Motjj  (M+l)u.. 

m  t - : - 

“11 

sinf-JL] 


fC..  (M+Dut. 
s1n[ — ^]cos[ 2 — 


"11 

sin[4i] 


(3.12) 


where 


(i»1j  , 


“lj  '  "1  "j 


Similar  expressions  can  be  derived  for  the  other  elements  of  the  matrix  in 
(3.9).  For  large  M  we  can  further  simplify  the  computations  by  using  some 
approximations.  From  Lemma  A.l  it  follows  that 


1 

W 


V(t)V(t)^  ^  >  0(Jj 


(3.13) 


In  (3.13)  we  tacitly  assumed  that  ^  u>.  for.  1  ^  j  .  If  this  is  not  the 

A  A  V 

case,  we  can  work  with  a».  and  u.  slightly  corrected  as 

'  J 


“i 


+  e  t 


(3.14) 


for  some  e  of  order  l/L/TT  .  We  conclude  from  (3.13)  that  for  large  M  the 
following  simple  estimate 

^  |^^V(t)y(t),  (3  ^5) 

is  an  approximation  of  order  1/M  of  ,  (3.9).  Note,  however,  that  the 

smaller  influx  -  uJ  the  larger  the  value  of  M  needed  for  the  approximation 
i*j  ^ 

in  (3.13)  to  be  valid  (see  the  discussion  in  the  appendix  and  also  equation 
(3.12)).  If  M  is  not  large  enough  then  may  not  be  a  good  approximation  of 

A  A 

^  .  Furthermore,  the  calculation  of  may  be  oroblematic  in  such  a  case 
since  the  matrix  in  (3.9)  will  be  ill-conditioned. 

We  conclude  tnis  section  with  a  discussion  of  the  asymptotic  properties 

A 

of  the  estimates  introduced  above.  The  frequency  estimaues  obtained  by 

the  OYW  method  ape  consistent,  [15].  The  asymptotic  (as  N,L  -  )  standard 
deviations  of  are  of  order  l/L/R"  ,  provided  that  L  increases  not 

faster  than  ,  with  y  <  1/2  [12].  The  condition  y  <  1/2  is  sufficient 
but  probably  not  necessary.  A  necessary  and  sufficient  condition  on  y  is  not 
known.  Since  the  CRLB  on  the  standard  deviation  of  is  0(1/N  '  )  as  is 

shown  in  the  appendix,  it  seems  possible  to  improve  significantly  the  accuracy 
of  the  OYW  estimates. 


10 


Aui 


An  analysis  of  the  asymptotic  behavior  of  {a^ ,  i(ij}  ,  (3.9),  (3.10),  does 
not  seem  to  be  available  in  literature.  Due  to  the  use  in  (3.9)  of 


*  A 

{(u^. }  instead  of  such  an  analysis  is  not  so  easy.  Since  {a^} 


are  used  as  initial  estimates,  their  accuracy  is  not  too  important,  and 

J 


will  not  be  discussed  in  detail.  What  is,  however,  quite  important  is  the 
choice  of  M  in  (3.9).  To  simplify  notation,  we  will  consider  the  case  of  a 
single  sinusoid  (m»l).  It  should  be  emphasized,  however,  that  the  same 
conclusions  apply  also  for  m  >  1. 


For  (ff»l  and  large  M  we  have  from  (3.9),  (3.13), 


,  M  si  nut  *  .  , 

i’"'!'  *  w  I  *  {y(t)-[sinut  cosutli))}  +  0[i)  = 

t^l  COSut 


,  M  sinut  -)  M  I  tcosut 


tsi nut  j 
COSutJ 


[tcosut  -tsinutlip  (  (u-(u)  + 


I  -t  sin  ut 

t»l  '  -t  COS  ut 


tCOSut 


•tsinuti 


[tcosut  -tsinut]\|)  + 


si  nut  I 

*  Ct  si  nut,  t  cosutlij)  1  [u~u)  +  .  »  •  +  0(1/M)  , 

_COSutJ  > 


(3.16) 


where  ^  ^  [3,  b]'  is  the  vector  of  the  true  parameters.  It  is  not  difficult 
to  see  that  the  first  term  in  (3.16)  is  0(l//li)  .  Since  u-u  *  Od/l/lT)  , 
see  the  discussion  above,  it  can  be  shown  that  the  second  term  is  0(M/L/ir)  , 
the  third  is  ,  etc.  Thus  if  M  increases  faster  than  l/IT  (for 


example,  if  we  set  L  *  for  some  6  >  0  ,  and  M=N),  then  difficulties 


A  IN'S! 


A 

may  occur.  Indeed,  1n  such  a  case  the  estimate  iji  may  not  be  consistent.  The 
condition  M  <<  L  /TT  must  be  imposed.  Then  the  first  and  second  terms  in 
(3.16)  are  asymptotically  the  dominant  ones.  Note  that  the  magnitude  of  the 
first  term  decreases  with  M  while  that  of  the  second  increases  with  M.  To  get 

A  A 

good  asymptotic  properties  for  (i.e.,  small  estimation  error  ),  M 
should  be  chosen  such  that  these  two  terms  have  the  same  magnitude.  Thus  the 
“optimal"  rate  of  increase  of  M  is  given  by 

M  =  .  (3.17) 

The  estimation  error  (i|)-ij»)  corresponding  to  this  choice  of  M,  is  of  the  order 
1//F  . 


4.  A  MAXIMUM  LIKELIHOOD  ALGORITHM 


The  estimate  of  9  is  obtained  as  the  minimum  point  of  the  following 
loss  function  (see  [7], [8]  and  also  the  appendix) 


where 


LF  =  I  e^(t.9)  , 
t*l 


e(t,9)  =  y(t)  -  j  sin((u.t  +  , 


(4.1a) 


(4.1b) 


We  use  the  Gauss-Newton  algorithm  to  minimize  (4.1).  Let  9^  denote  the 

A|/.f  1 

parameter  estimate  at  iteration  k.  The  updated  estimate  9  is  computed  by 


-  [J^  e3[t.9'^)e^(t,5'^]]“^[J^  eg(t,9’')e(t.9'']]  (4.2a) 


where 


e,(t.a)  , 


and  where  we  set 


(4.2b) 


9°  =  9  -  the  OYW  estimate. 


(4.2c) 


The  elements  of  the  gradient  vector  e.(t,9)  are  given  by 

9 


4  — -  «  -sin(a,,t  +  4,) 


1  ’’1 


(4.3) 


=  -a^.  cos((D^t  +  (t»^)  >  i=l,...,m  . 


-4^-^  *  -t  a,.  COs(u.t  + 

3(ii»  1  I  1 


The  matrix  to  be  inverted  in  (4.2)  contains  entries  of  very  different 
magnitudes.  The  elements  of  its  left- top  mxm  block  are  of  order  N,  while 
those  of  the  right-bottom  mxm  block  are  of  the  order  .  Thus,  it  is 
desirable  from  the  numerical  standpoint  to  "balance"  the  elements  of  the 


matrix.  This  will  also  be  convenient  for  some  subsequent  theoretical 
consi derations. 


Let  us  introduce  the  notation 


(4.4) 


where  denotes  the  K  x  K  identity  matrix.  The  following  recursion  is 
equivalent  to,  but  numerically  more  reliable  than,  (4.2a) 


K  *  K  - 


I  S9(t,0'^)e(t,8M] 


(4.5a) 


where 

H,^(0)  =  KjJ^[  I  eg(t,0)eg(t,0)]K"^  .  (4.5b) 

Evaluation  of  the  vector  V  c  (t,0*^)e(t,i*^)  is  straightforward.  Its  elements 

t»l  ® 

contain  trigonometric  functions  which  could  be  computed  efficiently  by  the 
technique  discussed  in  the  previous  section.  Evaluation  of  the  matrix 
H^(0  )  can  be  done  similarly  but  it  appears  quite  costly.  To  overcome  this 
difficulty  we  propose  an  approximate  version  of  the  iterative  algorithm 
(4.2a). 

As  is  shown  in  the  Appendix, 

H'^(0)  =«  G(0)  +  0(1/N)  ,  (4.6) 

where 


Replacing  in  (4.5)  by  its  large  sample  approximation  6(9'^)  we  get, 

-  u,j  G(9'^)[K'^  I  eg(t,e'^)e(t,6'^)]  (4.8) 

t“  X 

where  is  a  sequence  of  positive  scalars  which  can  be  used  for 

controlling  the  step  size  (uj^  can  be  determined,  for  example,  by  using  a  line 
search  algorithm.).  The  algorithm  (4.8)  is  much  simpler  than  (4.2a).  The  two 
algorithms  have  clearly  the  same  convergence  point.  Furthermore,  for  large  N 
they  will  also  have  similar  convergence  rates. 

We  conclude  this  section  by  a  discussion  of  the  asymptotic  accuracy  of 
the  limiting  (as  k  ♦  «  )  estimate  obtained  by  (4.8).  Let  this  estimate  be 
denoted  by  9  , 

9  =  lim  9*^.  (4.9) 

k-H» 

Since  we  initialize  the  recursion  (4.8)  with  a  consistent  estimate,  it  is 
expected  to  converge  in  a  few  iterations.  In  fact,  parallelling  the 
calculations  in  the  proof  of  Theorem  4.1  below  it  is  possible  to  show  that 
(4.8)  will  asymptotically  (as  N  -*■  -)  converge  in  one  iteration  provided  that 
L  in  (3.5)  tends  to  infinity  faster  than  /K  . 


Under  the  Gaussian  hypothesis,  9  is  the  ML  estimate.  We  expect, 
therefore,  that  its  asymptotic  covariance  matrix  equals  the  CRLB 

»  x^G(9)  ,  see  the  appendix  for  the  derivation  of  .  However,  this 

15 


does  not  follow  Immediately  since  some  of  the  standard  assumptions  of  ML 


theory  [10]  fail  to  hold  in  our  case  (e.g.  e^lt.a)  is  a  nonstationary 

0 


process) , 


If  we  relax  the  Gaussian  hypothesis,  then  0  is  the  prediction  error  (PE) 
estimate,  [16],  Again,  the  standard  PE  theory  does  not  apply  to  our 
problem.  If  it  were  applicable  it  would  follow  from  [16]  that  the  asymptotic 


covariance  matrix  of  0  is  still  given  by  P®  . 


The  asymptotic  covariance  matrix  of  the  normalized  estimation  errors 


Kj^(9-9)  is  derived  next.  We  show  that  this  matrix  equals  P^j^  ,  for  any 


distribution  function  of  the  data. 


Theorem  4.1.  Consider  the  process  y(t)  generated  by  (2.1), (2. 2)  under  the 
assumptions  stated  except  that  e(t)  is  allowed  to  be  non-Gaussian.  Let  9  be 
the  estimate  given  by  (4.9).  Then 


lim  E[Kj^(9-9)(9-9)'''Kj^^]  =  P®|^. 

N-H» 


(4.10) 


where  is  defined  in  (4.4)  and  P®j^  =  X^G(9)  . 


Proof: 


Note  that. 


I  ejt,9)e(t,9) 
^  t=l  ® 


(4.11) 


Thus,  for  large  N  we  can  write 


0  *  Kw  I  ejt,3)e(t)  +  F(9)  Km(9-9)  + 
N  ^^1  9  N 


1  ^  3F(9) 


(4.12) 


(4.13) 


where  9.  is  the  i-th  component  of  0  and 

_1  N  T  _1 

^(9)*  I  {eg(t,9)  eg(t,9)+egQ(t,9)e(t,9)}K^  . 

t^l 

The  first  term  in  (4.12)  is  asymptotically  independent  of  N.  To  see  this  note 
that  its  asymptotic  covariance  matrix,  say  P,  is  given  by  (see  (4.5b)  and 
(4.6)) 

P^  lim  ^^eg(t,9)eg(s,9)e(t)e(s)]K^^}  * 

=  lim  K“^[  ^  eg(t,9)cJ(t.9)]K;;^  =  5i^P®^)’^  (4.14) 

N-h»  t=l 

where  P^^^  is  defined  in  the  Appendix.  The  last  equality  in  (4.14)  is  also 
proven  in  the  Appendix. 


Next  we  show  that  for  large  N 


,-l, 


N 


■1_ 


S  egg(t,9)e(t)]K^"=  0{1//N). 

The  matrix  £.„(•,•)  of  second-order  derivatives  is  given  by 
38 


(4.15) 


0  I  -diag[cos((»)^.t+(>^)]  '  -diag[tcos((i)^t+(^^)] 

diagCcos(u)^t+4»^ )]  ,  diag[a^sin(u,.t+,|)^ )]  |diag[ta^sin(uj^t+^^. )] 


-diagCtcos((i»^t+(^^)]  1  diagCta.jSin(a).jt+if^)]  1  diagCt^a^siniu^t+i^^)] 

I  I  — 

(4.16) 


where  each  block  of  the  matrix  has  size  mxm.  The  generic  element  of  the 
matrix  in  (4.15)  can  therefore  be  written  as 


V 


N 


4 


sin((i)t+<|»)e(t)  , 


(4.17) 


where  3  =  (0,1  or  2}  ,  (i={±a.|  or  ±1}  ,  u  *  u.j  ,  and  ,j,  =  or  <1,^+  . 

The  variance  of  V.,  is  readily  evaluated: 

N 


2  N  N 

E{VjJ}  *  I  i  t®s®sin(u»t  +  <j))sin(a)S  +  <ji)E(t)e(s)}  = 


^2  M  o,  . 

2a+2  ^  ^  +  ^)  • 

^  t»l 


Thus, 


which  proves  (4.15). 


It  follows  from  the  calculations  above  and  from  the  Appendix  that 


F(9)  =  +  0(1//Tr  ]  . 


Mext  we  show  that  the  higher-order  terms  in  (4.12)  can  be  neglected 
asymptotically.  Note  that 


3F(9) 

33^ 

3F(9) 


is  of  the  same  order  of  magnitude  as  F(9)  ,  for  i»l,...,2m  , 


is  of  the  order  of  magnitude  of  F(9)*N  ,  for  i=2m+l, . . . ,3m. 


Si  nee 


(e^-9j 


jO(l//Tr]  ,  i*l,..,2m  , 

(o(l/N/Tr]  ,  i=*2nH-l . 3m  , 


and  since  F(9)  is  asymptotically  independent  of  N  as  shown  above,  we  conclude 
that  the  higher-order  terms  in  (4.12)  are  0(1//1T)  .  Thus,  for  large  N, 


S'*-*'  “  -  "nVI  s'*-*'*"' 

\  L 


which  implies  that,  cf  (4.14) 


11m  E1K,(;-9)(?-9)\i  ■  '  P*,  .  p“„ . 

N-h»  X  X 

It  follows  from  the  result  above  that  In  the  case  of  Gaussian  data,  the 
estimate  a  of  a  is  asymptotically  efficient.  For  non-Gaussian  data,  a  will 
asymptotically  be  the  minimum-variance  estimate  in  a  fairly  large  class  of 
estimators  whose  covariance  matrices  depend  only  on  the  second-order 
statistics  of  the  data. 
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5.  THE  PROBLEM  OF  LOCAL  MINIMA 


A  major  concern  1n  any  iterative  minimization  algorithm  is  the  presence 
of  local  minima  in  the  function  to  be  minimized.  Below  we  analyze  the  shape 

A 

of  the  Loss  Function  (LF).  For  an  arbitrary  parameter  vector  9  we  can 

A 

express  LF(9)  as. 


LF(9)  =  N.{LF  (9)  +  LF  (9)  +  R} 


(5.1a) 


where 


N 


LFjIa)  =  -  i{t)]2 


LFn(i)  “  f  ^  ^  e(t)[x(t)  -  ilt)] 


1  ”  7 

R  =  ^  1;  e^(t) 


(5.1b) 
(5.1c) 
(5. Id) 


t=l 


and  where  x(t)  is  defined  as  in  (2.1)  but  with  elements  of  9  replacing 


elements  of  9  there.  Comparing  (5.1c)  and  (4.17),  we  see  that  1-F^(9)  is 


0(1//Tr)  .  Also,  writing  out  (5.1a)  and  using  (3.12)  (with  M»N),  it  is  easy 
to  show  that 


N 


~  .1  in  I  [a^sin((u^t  +  -  a^sin{u»^t  +  +  0(1/N) 

1*1  t*l 

n 

*  ^  F^(a^,  (1)  j ,  ^j)  +  0(1/N) 


where 


*AA^  AAa2 

^ j ( ci^  I  ■“  ^ 

t"**  1 


Thus,  to  within  0(1/N)  LF^le)  is  the  sum  of  m  decoupled  functions  F^; 
moreover,  all  of  the  F^'s  have  the  same  form.  Understanding  the  shape  of  LF, 
asymptotically  reduces  to  understanding  the  shape  of  the  function 


A  A 


F(a,  (*j,  i»)  =  ^  I  [asin(ujt  +  ■>)  -  oSin(u)t+4,) ]' 


(5.3) 


t=l 
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It  is  easy  to  check  that  F  is  quadratic  in  a  and  sinusoidal  in  ,  thus,  the 
local  minima  of  F  with  respect  to  these  two  variables  are  the  global  minima. 

A  * 

However,  F  is  not  so  well-behaved  as  a  function  of  u  .  A  plot  of  Flu)  for 

A  A 

M*40,  u*0.4  ,  »  0,  and  a  “  a  “  1  is  shown  in  Figure  5.1.  From  this 

figure  it  is  apparent  that  the  initial  estimate  of  u  must  be  within  the  deep 
valley  if  we  expect  the  Gauss-Newton  algorithm  to  find  the  global  minimum. 

In  Appendix  B  we  show  that  the  width  of  the  valley  is  in  the  range 
Aue[2jr/N,  8ir/N]  .  Thus,  the  initial  estimate  must  have  a  standard  deviation 

A 

on  the  order  of  2ir/N  .  However,  the  standard  deviation  of  u  estimates 

1/2 

obtained  from  (3.5)  are  Oll/L/TD  ,  when  L  <  N  which  asymptotically  is  too 
large  for  use  with  the  Gauss-Newton  method.  Thus,  we  need  to  improve  the 
initial  frequency  estimates  before  starting  the  minimization. 

It  is  known  [7]  that  for  N  +  «  the  ML  estimates  of  {u^}  are  given  by 
the  maxima  of  the  periodogram.  Thererfore,  one  method  for  improving  initial 

A 

frequency  estimates  is  to  search  in  some  small  interval  about  each  uj  ,  say 
-  e,  uj^+sl  the  maximum  of  the  periodogram.  Specifically,  the 
following  method  can  be  used. 

Choose  appropriate  values  for  au  and  i 

max 

For  each  i=l,2,...,m 

1)  Compute  the  periodogram  a,  of  the  data  at  frequencies 

1 1 

A  A 

Uj  *  (*ij  i  AAu  1  *  ® ^  ♦  *  •  • » 

li  1  max 


using 


“it 


> 


where  s.  and  b.  are  computed  using  (3.15)  but  with  M»N 
*  4 

2)  Choose  as  the  new  initial  frequency  estimate  the  u^^  whose 

corresponding  is  largest;  compute  the  new  initial  amplitude  and 
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phase  estimates  using  (3.15). 

From  the  above  discussion,  ^  should  be  chosen  less  than  2tt/N  to  ensure  that 

one  of  the  is  in  the  deep  trough  of  LF;  in  our  simulations  we  used 

Owe  fin-  ,  5-1  •  Moreover,  i  should  be  chosen  so  that 
■•oN  NJ  max 

A  A 

Pr[tt»^ is  sufficiently  large.  In  our  simulations  we 

chose  a  20  ;  however,  more  sophisticated  procedures  could  be  used.  For 

example,  since  the  OYW  method  was  used  to  obtain  m.  ,  i  could  be  chosen 

1  (OdX 

based  on  the  asymptotic  probability  destribution  of  the  given  in  [12]. 
Finally,  one  must  ensure  that  the  search  interval  for  two  adjacent  frequencies 
do  not  overlap. 


6.  NUMERICAL  EXAMPLES 


We  present  some  numerical  experiments  that  indicate  the  performance  of 
the  proposed  algorithm.  We  first  consider  the  poblem  of  estimating  e  from  a 
signal  of  the  form  (2.1),  where  m*2. 


Oj  »  1.0 

“1 

»  0.4 

=  -0.5 

o 

• 

H 

“2 

-  0.2ir 

.^2  *  )-*0 

In  all  examples  L  *  /IT  ,  and  M  is  chosen  as  in  (3.17).  The  white  noise 
2 

variance  x  was  varied  so  that  the  SNR  ranges  between  0-20  dB  In  2.5  dB 

2  2 

increments.  (Here,  SNR  is  defined  for  each  signal,  I.e.,  SNR  =  oj/2x  )  • 

For  each  SNR,  50  Independent  data  sets  were  generated,  and  average  sum-squared 

A 

errors  (SSE)  of  the  resulting  estimates  of  were  computed.  The  SSE  is 
defined  as 


hi. 

1»1 

where  K  is  the  number  of  independent  estimates  obtained  (50  in  these 

A 

simulations)  and  Is  the  i-th  estimate  vector. 

The  SSE  of  the  estimated  coefficients  for  N*500  are  shown  In  Figures  6.1- 

6.3.  In  these  (and  the  remaining)  figures.  Initial  estimates  are  those 
obtained  using  the  methods  of  section  3.  Equation  (3.15)  was  used  for 
estimates  in  these  plots;  however,  the  SSE  for  estimates  obtained  using  (3.9) 
are  not  significantly  different  (and  In  particular,  no  better  on  the 
average).  From  these  initial  estimates.  Improved  estimates  were  obtained  as 
outlined  In  the  previous  section,  then  the  Gauss-Newton  algorithm  (equation 
(4.8))  was  used.  In  equation  (4.8),  was  at  each  iteration  set  to  1;  If  LF 
increased,  u  was  decreased  by  a  factor  of  4  until  the  resulting  step  was  such 
that  LF  decreased. 

Figures  6. 4-6. 6  shows  the  SSE  of  the  initial  estimates  from  the  OYW 
method,  the  Improved  estimates  using  the  method  of  Section  5,  and  the  ML 
estimates.  The  number  of  data  points,  etc.  is  the  same  as  for  figures  6.1- 

6.3,  and  only  the  first  parameters  a^,  and  u  ,  are  shown.  From  Figure 
6.6  we  see  that  the  method  of  Section  5  significantly  improves  the  initial 
frequency  estimates,  especially  for  low  SNR.  Moreover,  the  iterative  ML 
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method  provides  significant  Improvement  over  the  modified  initial  estimates. 
Mote  that  although  amplitude  and  phase  estimates  sometimes  become  worse  after 
the  initial  frequency  Improvement,  they  become  much  better  after  the  iterative 
step.  As  a  side  note.  Figure  6.6  shows  that  the  iterative  ML  method  perform 
better  than  regular  FFT-based  methods,  since  the  frequency  estimates  after  the 
improvement  of  section  5  are  at  least  as  good  as  FFT-based  estimates. 

From  figures  6. 1-6.3  it  can  be  seen  that  the  SSE  of  the  ML  estimates  are 
very  close  to  the  Cramer-Rao  bound  for  SNR's  above  0  dB.  For  the  SNR  of  0  dB, 
the  high  SSE's  are  caused  by  convergence  to  a  local  minimum  of  LF  in  only  2  of 

A 

the  50  cases;  and  in  these  two  cases,  the  estimates  were  In  error  by  less 
than  O.OOeit  .  Similar  performance  is  evident  in  Figures  6.7  and  6.8  for  N  » 
1000  and  M  =  50  data  points,  respectively. 

From  these  figures  it  is  apparent  that  there  is  a  SNR  threshold  above 
which  the  ML  estimator  gives  variances  that  agree  closely  with  the  CR  lower 
bounds.  Moreover,  this  threshold  decreases  with  Increasing  number  of  data 
points.  This  latter  fact  is  evident  from  figures  6.3-6. 5  where  the  threshold 
is  5  dB  for  M  »  50  data  points,  2.5  dB  for  N  ®  500,  and  0  dB  (or  lower)  for 
N  =  1000. 

Figure  6.9  shows  results  for  N  *  500  data  points  when  the  frequency 
difference  between  ui^  and  varies.  Specifically,  is  fixed  at  0.4^, 
and  (ii^  is  varied  from  0.2if  to  0.4ir  .  The  amplitudes  and  phases  are  the  same 
as  before,  and  the  SNR  is  10  dB.  When  *  0.375ir  ,  the  ML  method  fails  to 
yield  better  average  results  than  the  initial  guess;  however,  these  poor 
averages  are  caused  by  failure  of  the  ML  method  to  improve  the  estimates  In 
only  5-10  of  the  50  cases. 

In  Figure  6.10  we  show  average  error  results  for  M  »  500  data  points  when 
the  additive  noise  is  colored.  The  noise  used  is  MA(1): 

n(t)  »  [e(t)  +  0.9e(t-l)]//T:5r 

Note  that  n(t)  has  the  same  total  power  as  e(t)  does.  It  can  be  seen  in 
Figure  6.10  that  the  ML  method  provides  signficant  Improvement  over  Yule- 
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Walker  estimates  even  for  colored  noise.  In  fact,  the  errors  in  the  colored 
noise  case  are  lower  than  when  white  noise  was  used.  This  is  presumably  due 
to  a  lower  asymptotic  CR  lower  bound  for  the  colored  noise  case.  The  Yule- 
Walker  method  does  not  give  consistent  estimates  in  this  case,  because  the 

first  row  of  equation  (3.5)  should  not  be  used  (the  data  can  be  modeled  as  a 

limiting  ARMA(4,5)  process,  in  which  case  (3.5)  holds  only  for  k  >  2)  . 
However,  for  large  L,  the  effect  of  the  first  equation  is  small,  and 
"reasonable"  estimates  could  still  result  (as  is  seen  in  Figure  6.7).  V/e  note 
also  that  for  colored  noise,  the  proposed  method  is  not  a  maximum  likelihood 

estimate,  but  it  is  still  an  output  error  method. 

The  CPU  time  eeded  to  obtain  the  ML  estimates  was  about  10  times  that  of 

the  time  needed  to  obtain  the  initial  estimates.  (The  initial  estimates 

required  .08-. 35  seconds  and  the  ML  estimates  .70-4.0  seconds  on  a  VAX  11-750 

as  N  ranged  from  100-1000).  About  one-half  of  the  CPU  time  was  spent 

obtaining  improved  initial  guesses  via  (5.4),  and  the  other  half  was  spent  on 

the  actual  function  minimization.  The  minimization  procedure  rarely  required 

-4  -6 

more  than  3-4  iterations  to  achieve  a  tolerance  of  lo  to  10  (where  no 
element  of  e  changed  more  than  "tolerance"  in  one  iteration). 

As  a  final  note,  the  recursive  computation  of  sin^t  or  cosut  using 
(3.11)  required  only  about  1/6  the  CPU  time  of  direct  computation.  The  error 

_3 

between  the  recursively  and  directly  computed  values  remained  below  10  for 
N  <  1000  (using  single  precision  arithmetic);  a  typical  plot  of  the  error  is 
shown  in  Figure  6.11. 


7.  CONCLUSIONS 


We  derived  a  (simplified)  Gauss-Newton  algorithm  for  estimating  the 
parameters  of  sinusoidal  signals  in  noise.  The  algorithm  is  based  on 
maximization  of  the  likelihood  function  and  is  initialized  by  a  set  of 
preliminary  estimates  obtained  via  the  overdetermined  Yule-Walker  method.  The 
asymptotic  properties  of  the  proposed  techniques  are  discussed  and  it  is  shown 
that  the  parameter  estimates  are  consistent  and  asymptotically  efficient  for 
the  Gaussian  case.  In  the  non-Gaussian  case  the  estimator  provides  a  minimum- 
variance  solution  within  a  large  class  of  estimators  based  on  second  order 
statistics. 


The  performance  of  the  proposed  technique  and  its  capability  for 
resolving  closely-spaced  sinusoids  were  studied  by  Monte-Carlo  simulations. 
It  was  shown  that  the  Gauss-Newton  procedure  performs  better  than  the 
overdetermined  Yule-Walker  method. 


APPENDIX  A 


CRAMER-RAO  LOWER  BOUNDS 

The  estimation  problem  formulated  in  section  2  falls  into  the  class  of 
nonlinear  regression  problems.  The  CRLB,  say  P^^.  ,  for  any  unbiased 
estimator  of  e  and  can  be  easily  derived  [7].  In  this  appendix  we  will  be 
interested  in  the  asymptotic  CRLB:  P*j^  .  The  reason  for  this  interest  is 
threefol d: 


(il  p*  has  a  much  simpler  expression  than  pj*  and  is,  therefore,  much 

wK  wK  .. 

easier  to  compute.  Yet  p“  is  a  good  aporoximation  of  p™  whenever 

wK  LK 


inf 

i^tj 


(A.l) 


This  will  become  apparent  in  the  following,  where  it  will  be  shown  that  the 
smaller  the  minimum  separation  in  frequency  inf|a,^-u,jl  *  the  slower  is  the 
convergence  of  pjl!-  to  P*  .  It  is  worth  noting  that  a  main  conclusion  of 
the  study  of  p”j^  in  [7]  was  that  p"j^  increases  rapidly  as  the  minimum 
frequency  separation  goes  below  the  critical  value  2ir/N  .  In  such  a  case 
p|l!„  is  much  larger  than  p*  . 


(ii)  p^j^  can  be  attained  only  under  certain  restrictive  conditions  [10] 
which  apparently  are  not  satisfied  for  the  problem  under  study.  On  the  other 
hand,  P*|^  is  attained  in  the  limit  (as  n  ■►  «  )  by  the  covariance  matrix  of 
the  ML  estimate;  see  theorem  4.1.  Furthermore,  for  other  estimation  methods 
(such  as  the  OYW  method]  only  asymptotic  results  are  available.  Thus,  it  is 
P“j^  which  is  of  interest  in  any  analytical  study  comparing  the  performance 
of  the  ML  method  with  that  of  other  estimation  methods. 


(iii)  The  expression  of  P"  is  useful  in  the  derivation  of  the 
simplified  ML  Gauss-Newton  algorithm  in  section  4.  Note  that  an  expression 

for  P"  does  not  seem  to  be  available  in  the  literature,  except  for  the 

CR 

special  case  of  m  *  1;  see  [9]  and  its  references. 

For  the  estimation  problem  under  discussion,  the  log-likelihood  function 


is  given  by 


N  .2  1 


L(3.X  )  »  -  T  In  X  -  I  e"(t)  , 

^  ^  ax'^  t=l 


where 


III 

e(t)  »  y(t)  -  I  a.sin(u  t  +  (j). ) 

i=l  1  ^  ^ 
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can  be  evaluated  by  straightforward  calculations: 
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(A.2) 


(A. 3) 


(A.4) 


(A.S) 


(A. 6) 
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In  (A. 5)  and  (A. 6)  we  used  the  assumption  that  e(t)  is  white  Gaussian 
noise.  It  follows  that 


^CR 


CR 


2x 
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where 


=  ,2 
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■t=l 


(A. 8) 


and  where  the  derivates  of  e(t)  with  respect  to  the  parameters 
{a^. ,  01^.}  are  given  by  (4.3).  The  expression  (A. 8)  for  appears,  for 


example,  in  [7].  However,  the  calculations  necessary  to  show  that  has 

the  bloclt-di agonal  form  of  {A.7),  which  in  turn  implies  that  P®*^  is  given 

CR 

by  (4.8)  were  not  included  there. 


In  the  following  we  will  study  the  limit  of  P®»^  as  ^ 

LK 


The 


following  results  will  be  useful  for  this  study. 


Lemma  A.l. 


For  ui  e  [0,  2jr)  , 


cos<ti 


1  " 

|jj-  ^  C0S((ji)t  +  ) 


^  sin(^)cos(li^a,+^) 


sin(“] 


for  u)  =  0 

for  u)  ^  0 


(A. 9) 


Proof:  [17]. 


Corrolary . 


For  uieCO,  2it), 
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k  >  0 


(A. 10) 
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Ul  ^  0  , 


Proof:  For  k*0  the  limit  follows  immediately  from  (A. 9).  For  k  >  0  the 
limits  follow  from  relations  similar  to  (A. 9)  obtained  by  differentiation  of 
(A. 9)  with  respect  to  u  . 


Let  us  denote 


'  Si  S 

where  K  is  given  by  (4.4).  Clearly  is  the  CRLB  on  the  covariance 

matrix  of  the  following  normalized  estimation  error  vector 


/N(a  “a) 

✓ir{(j)-(^) 

N/Tr{ai“(u) 


(A.12) 


where  o  »  Ca, .  and  ^  ^s  any  unbiased  estimator  of  «  • 

/«  «  4  m 

and  0),  (u  are  similarly  defined.  In  the  following  we  will  show  that 


-  ^r" 


exists  and  has  a  simple  expression. 


{A.13) 


By  making  repeated  use  of  Lemma  A.l  and  its  corrolary  we  can  write,  see 


(A. 9) 
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where  d.  .  denotes  Dirac's  delta  (3.3b).  Therefore, 
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which  after  some  straightforward  calculations  gives 
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Mote  that  the  bounds  for  phases  and  frequencies  are  proportional  to  the 
noise- to- signal  ratios  corresponding  to  the  frequency  In  question.  However, 
somewhat  contrary  to  Intuition  the  bound  for  the  amplitudes  of  the  sinusoids 
1s  Independent  of  these  amplitudes.  Note  also,  the  almost  diagonal  structure 
of  P®  .  The  estimation  errors  of  the  phase  and  frequency  of  the  same 

wK 

sinusoid  are  asymptotically  cross-correlated.  All  the  other  estimation  errors 
are  asymptotically  uncorrelated. 

It  Is  also  Interesting  to  note  that  the  bounds  for  are  of  order 

(see  also  [9]  and  Its  references).  This  order  of  the  CRLB  Is  rather 
unusual  for  a  stationary  estimation  problem  for  which  the  corresponding  bounds 
are  In  general  of  the  order  .  However,  the  problem  of  estimating  the 

parameters  of  a  sinusoidal  signal  Is  not  a  strictly  stationary  estimation 
problem:  the  derivative  of  cCt)  with  respect  to  u».  Is  clearly  a 
nonstationary  signal. 


It  follows  from  Lemma  A.l  that  the  smaller  the  minimum  frequency 

separation  Influ.-uJ  the  slower  Is  the  convergence  1n  (A. 13).  Consider,  for 
1*j  ^  ^ 

example,  (A. 9)  for  u)  small  but  non-  zero.  Then  the  left-hand-side  of  (A. 9) 
will  generally  be  small  provided  that  Mu  ,  rather  than  M,  1s  large  enough, 
see  the  right-hand-side  of  (A. 9). 
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APPENDIX  B 


Since  estimates  of  a>,  (>,  and  a  are  0(1//Tr]  or  0(1//N^)  ,  we  restrict 
attention  to  the  case  u  ~  u  ,  a  -  a,  and  .  From  (5.3)  the  derivative 

of  F  with  respect  to  is 
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I  t[aSin((ut+,^)  -  oSin((at+(^)  ][-2aCOs((ut+(^)  ] 


N  A  ^  ^2  H 
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(B.lb) 


We  claim  that  the  zeroes  of  2F/2m  in  the  region  of  interest  are  nearly  equal 
to  those  of  the  third  term  of  (B.lb).  To  support  this,  a  plot  of 


1  ^ 

n"  y  t  sin^^t 
^  t»l 


for  N=100  is  shown  in  Figure  B.l.  It  can  be  seen  that  for  ^  not  near  zero, 
this  function  is  near  zero.  Also  near  t))=0  the  zero  crossings  have  large 
slopes  and  are  therefore  insensitive  to  small  additive  disturbances.  Defining 
oj  a  ui  -  01  and  ^  ^  ^  ,  the  third  term  in  (B.lb)  can  be  expressed  as 


*  N  .  . 

■  It  ^  tsin(oit  +  ^) 

N  .  N 

=  -aa[cost  (  I  tsinjt)  +  sin5^(^  I  t  cos  utj] 


(B.3) 


Since  ^  is  Od/ZN)  ,  the  second  terra  in  (B.3)  can  be  neglected.  Thus  for 

AAA  A 

0)  -  (i»  ,  a  ■  a  and  I.  ,  the  zeroes  of  aF/aui  are  nearly  those  of  the 
function 
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It  is  not  difficult  to  see  that  (B.4)  is  zero  for  u  =  0  •  Morever,  for 

0^  <  u  <  5-  ,  (B.4)  is  positive  (since  each  element  of  the  sum  is  positive) 
~  ,  Zir 


For 

for 
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(B.4)  is  negative;  thus  the  first  positive  zero  of  (B.4)  occurs 


S  E  [^  ,  ^]  •  Since  (B.4)  is  an  even  function  of  2  ,  and  since  the 
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ABSTRACT 


The  paper  discusses  the  problem  of  detecting  transient  signals  of  unknown 
waveforms  in  white  Gaussian  noise.  The  signals  are  modeled  as  impulse 
responses  of  rational  transfer  functions  with  unknown  parameters.  A 
generalized  likelihood  ratio  test  (GLRT)  is  proposed  and  its  statistical 
properties  are  analyzed  for  both  known  and  unknown  noise  variances.  The  GLRT 
involves  constrained  maximum  likelihood  estimation  of  the  signal  parameters. 
The  performance  of  the  GLRT  is  compared  to  that  of  an  optimal  matched  filter 
and  an  energy  detector,  for  some  test  cases.  Also,  the  theoretical 
distributions  of  the  likelihood  ratios  under  Hq  and  are  compared  to 
experimental  distributions  obtained  by  Monte-Carlo  simulations. 


This  work  was  supported  by  the  Army  Research  Office  Linder  Contract  No.  DAA629- 
33-C-0027. 


1.  INTRODUCTION 


The  problem  of  detecting  transient  signals  with  unknown  waveforms  arises 
in  the  areas  of  seismic  signal  processing,  underwater  surveillance,  and  other 
engineering  applications.  Some  conmon  approaches  to  the  problem  Include: 

(1)  Energy  detection,  I.e.,  comparison  of  the  total  signal  energy  to  a 
threshold;  (11)  Noncoherent  detection,  I.e.,  computation  of  the  magnitudes 
of  the  Fourier  coefficients  of  the  signal  and  then  comparing  a  weighted  sum  of 
these  coefficients  to  a  threshold  [Ij;  (ill)  Modeling  of  the  signal  as  a 
linear  combination  of  known  waveform  with  unknown  coefficients,  see  e.g., 

[2] .  In  this  paper  we  propose  an  adaptive  detection  scheme  for  transient 
signals,  based  on  modeling  of  the  signal  as  the  impulse  response  of  a  rational 
transfer  function.  The  coefficients  of  the  transfer  function  are  assumed 
unknown,  and  are  estimated  by  the  detector.  It  is  shown  in  the  paper  that 
when  the  signal  is  absent,  the  parameter  estimation  problem  becomes  ill- 
posed.  This  difficulty  is  solved  by  introducing  a  constrained  maximum 
likelihood  estimator.  A  generalized  likelihood  ratio  test  (GLRT)  is  then 
introduced,  and  its  statistical  properties  are  analyzed.  It  is  shown  that  the 
GLRT  is  approximately  distributed  as  a  quadratic  form  [3,  ch.  29].  Under  Hq, 
the  quadratic  form  is  central,  while  under  Hj  it  is  noncentral  and  also 
contains  a  bias  term.  When  the  noise  variance  is  unknown,  the  distribution  of 
the  GLRT  involves  ratios  of  quadratic  forms  (central  under  Hq  and  non¬ 
central  under  H^^).  Expressions  for  the  weights,  the  noncentrality  parameters 
and  the  bias  are  given  as  functions  of  the  signal  parameters. 

The  performance  of  the  GLRT  for  the  case  of  known  noise  variance  Is 
examined  for  some  specific  test  cases,  and  theoretical  performance  curves  are 
shown.  Finally,  the  theoretical  distributions  of  the  likelihood  ratio  are 
compared  to  experimental  distributions  obtained  by  Monte-Carlo  simulations. 


2.  THE  GENERALIZED  LIKELIHOOD  RATIO  DETECTOR 


The  detection  problem  considered  in  this  section  is  as  follows. 

“  (1) 

Hj^:  y  *  2.  ^  . 

where  y,  m  and  v  are  N- dimensional  vectors,  and  where  v  is  a  zero  mean  white 
Gaussian  noise  whose  variance  a  is  assumed  to  be  known.  The  signal  m(3)  is 
assumed  to  depend  on  a  p-dimensional  vector  9e9  ,  where  p  <  N  .  We  make  the 
following  assumptions  on  m(9)  : 

(i)  The  functional  dependence  of  m  on  9  is  known,  but  the  value  of 

9  is  unknown. 

(ii)  The  Mxp  Jacobian  matrix 

,  3111(9) 

M(9)  4  ,  (2) 

exists  for  all  9e3  .  The  rank  of  this  matrix  will  be  denoted  by 
r(9)  .  Note  that  r(9)  may  vary  for  different  values  of  9  ,  but 
we  always  have  r(9)  <  p  . 

(iii)  jn(9)»0if  and  only  if  9  »  0  .  This  assumption  enables  us  to 
replace  (3.1)  by  the  equivalent  detection  problem: 


y  «  m(9)  +  v; 


HqZ  9*0 
9^0 


m(9)  belongs  to  the  column  space  of  M(9)  for  all  9  . 
Equivalently,  the  projection  of  m(9)  on  the  column  space  of 
M(9)  is  m(9)  itself.  Later  we  will  show  that  this  condition  is 
satisfied  when  m(9)  is  the  impulse  response  of  a  rational  transfer 
function. 


The  generalized  likelihood  ratio  is  defined  as 


P-(y) 

W 


(4) 


where  P  (y)  is  the  joint  probability  density  function  of  y  ,  given  the 
9  A 

parameter  e  .  The  vector  0  is  an  estimate  of  0  whose  exact  nature  will  be 
specified  later. 


THE  ESTIMATE  0 


It  would  be  natural  to  let  a  be  the  maximum  likelihood  estimate  of 
0  .  The  log  likelihood  function  is  given  by 

“  -  jlog(2iro^)  - [y-|3l{9)]^[^-£(6)] .  (5) 

2a  ” 

The  maximum  likelihood  estimate  is  obtained  by  solving  the  likelihood 
equati ons 

3logP,(y,)  .  - 

a  ” 

Unfortunately,  the  matrix  Mie)  need  not  have  a  full  rank  for  all  0  .  In 

particular,  M(0)  may  be  rank  defficient  at  9*0,  i.e.,  under  Hq  (later  we 

will  see  that  when  the  signal  is  a  rational  impulse  response,  the  rank  of  M(0) 

is  exactly  half  the  number  of  parameters).  Clearly,  the  number  of  independent 

equations  in  (6)  is  less  than  or  equal  to  the  rank  of  M(0)  .  Hence,  when 

M(9)  is  rank  defficient,  the  number  of  independent  equations  in  (6)  will  be 

smaller  than  the  number  of  unknowns.  Then  the  likelihoood  equation  will  have 

an  infinite  number  of  solutions.  Since  equation  (6)  is  nonlinear. in  9  ,  it 

is  difficult  to  chaacterize  the  set  of  all  possible  solutions  9.  However, 

* 

it  is  very  likely  that  the  norms  of  the  members  of  0  will  be  unbounded,  i.e.. 


sup  101  *  •  ^ 

9e9 

In  this  case  any  attempt  to  solve  the  likelihood  equations  (6)  would  lead  to 
severe  numerical  problems,  e.g.,  singularities  and  overflows.  To  circumvent 
this  difficulty,  we  will  therefore  modify  the  log  likelihood  function  (5)  by 


4 


subtracting  a  term  proportional  to  the  norm  of  9  ,  i.e.,  we  maximize 

"  ^9^9  *  -  1-109(2^0^) - ^[^-j]l(9)]^[ji-m{9)]  -  pe^9  .  (7) 

2a 

This  leads  to  the  modified  likelihood  equations 

^  M^(9)[^-m(9)]  -  u9  »  0  ,  (8) 

a 

where  u  is  some  positive  scalar,  serving  to  constrain  the  norm  of  9  . 


FIRST  ORDER  APPROXIMATION  OF  9 

Let  us  assume  that 

*  * 

9  =  8  +  A0  . 

(9) 

where  49  is  sufficiently  small,  so  that  the 

following  approximations  can  be 

made. 

A  A 

m(9)  -  m(9)  +  M(9)a9 

(10) 

M(9)  -  M(9)  . 

(11) 

The  likelihood  equations  (8)  can  then  be  approximated  by 

IT  -  . 

{9)[m{9)+v-m(9)-M(9)A9]  -  u9  -  yA0  -  0.  (12) 

a 

Hence 

ae  -  [M^(9)M(9)  +  a^uIp]'^[M'^(9)v  -  o^u9]  .  (13) 

m  * 

Next  ’we  derive  approximations  for  m(9)a9  ahd  for  [^-m(9)]  .  It  is 
convenient  to  use  the  singular  value  decomposition  (SVO)  of  M(9)  ,  given  by 

a(9)  0 

M(9)  »  U(9)  v'^(9)  (3.14) 

_  0  0_ 


5 


where  a(9)  is  a  diagonal  matrix  of  dimension  ris)  x  r(8)  ,  whose  elements 
are  all  positive.  The  matrices  U(9)  and  V(9)  are  orthogonal  matrices  of 
dimensions  MxN  and  pxp  respectively.  We  will  usually  omit  the  dependence  on 
9  for  notational  convenience. 

Substituting  (14)  in  (13),  we  can  express  a9  as 


A9  -  Y  U'v  -  V  v''’9 

0  0  “  0  I 

P-r 


Also, 


(A^+cr^uI^)‘^A^  0  o^u[A^+a%I^)'^A  0 

Ma9  -  U  U^V  -  U  Y^9 

0  0  '  °  ' 


y-ffl(9)  «  m(9)+v-m(6)-M(9)A9  =>  v  -  M(9)Ae 


2.22,,-!  ^ 

o  u(a  +0  ulj.]  0 


12,22,,-! 

a  u(a  +a  ylj.)  A  0 


U^v  +  U 


=•  UAU  V  +  Ub  , 


where 


A(9)  ^ 


2.2  2  ,  ,-l 

o  u(a  +  (j  ulp]  0 


2  2  2  -1 
jo  u(a  ■*■(?  ul^j  A  0 

;  b(9)  ^  I  y‘9 

“  0  0 
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FIRST  ORDER  APPROXIMATION  OF  THE  LIKELIHOOD  RATIO 


The  statistic  G  defined  In  (A)  Is  given  by 

G  ■  exp{-^  [j/y  -  (y  -  m(0))^(y  -  "(e)]]} 

2<j  ” 


(19) 


It  Is  convenient  to  define  the  likelihood  ratio  as  twice  the  logarithm  of 


G,  I.e.,  as 


*  ‘T  -  £<0)]} 

a  ^ 


(20) 


We  now  use  the  result  (17)  to  derive  first  order  approximations  for  L  under 
Hq  and  under 


Under 


IT  T2T  IT  2 
4  ■  T  Ui  -  »  1  a"  u'il  •  w  [4-A‘=]w  . 
<j  a 


(21) 


where 


w  a  U  V  . 


(22) 


I 


since  u  Is  orthogonal,  w  Is  a  zero>mean  white  Gaussian  noise  with  variance 

2  ” 
a  .  Also  note  that 


I,.A‘ 


(23) 


Hence,  under  H^  ,  L  Is  approximately  a  central  quadratic  form 


2  2  2 

r  A^(x^  2^  11 )  w^  A 

L,  -  I 

^  k«l  ,.2  +  2  a2 
U|^  *  a  u. 


(24) 


Recall  that  the  rank  r  and  the  singular  values  {x^,  x^*  •••  are  those  of 

the  matrix  M(0)  .  Mote  that  under  high  signal-to-nolse  conditions,  I.e., 

2  2 

when  a  ij  <<  X|j  t  we  have 


7 


(25) 


L.  -  I  (25) 

Thus,  Lj  has  approximately  a  central  Chi-squared  distribution  with  r  degrees 
of  freedom. 

Under  H. : 


c 

’  T 


[AU  Vb}^(AU  } 


-  (Aw+^  (Aw+^} 


where 


q  >  U  m  . 


To  further  simplify  this  expression  we  note  that  by  assumption  (1v), 


Ir  0 


U  U  m  3  m  , 

0  C  “ 


(28a) 


or  equivalently, 


Ir  0 


0  0 


q  a  q 


(28b) 


The  left  hand  side  of  (28a)  Is  Just  the  projection  of  m  on  the  column  space 
of  M.  Equation  (28b)  Implies  that  only  the  first  r  components  of  q  are 
nonzero.  The  same  Is  true  for  A^  ,  as  can  be  easily  verified  from  (18). 
Therefore  we  can  express  L  In  the  form 


L  -  ^  2(5.-A^\+  gjg.-  b\} 

c 

V  w 

\l^  +  Y  , 


(29a) 


where 


C1& 


2  2  2 
X^(x^+2a%] 

~7-7  ,2 

(X|(+a  u) 

(o^u)^X|j  j 

TT^T~J  ®^icl 

(Xj^  u) 


Y  » 


(29b) 


{29c) 


(29d) 


Hence,  under  H^,  L  Is  a  sun  of  a  noncentral  quadratic  form  and  a  bias  term. 
Mote  that  for  high  SNR  we  have  s,^  *  1,  «  q,j/o  and  y  *  0  .  Therefore, 

-  “it  ‘^It  2 

4  •  *  (^11  •  <»» 

which  has  a  non-central  Chi-squared  distribution  with  r  degrees  of  freedom  and 
non-central  1^  parameter 

d^q/or^  ■  »  SNR.  (31) 


V 

I' 


3.  THE  GENERALIZED  LIKELIHOOD  RATIO  DETECTOR: 
CASE  OF  UNKNOWN  NOISE  VARIANCE 
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In  this  section  we  consider  a  similar  problem  to  the  one  discussed  In  the 

2 

previous  section,  except  that  we  now  assume  the  noise  variance  a  to  be 
unknown.  Assumptions  (1)-(1v)  will  be  made,  as  before. 

The  generalized  likelihood  ratio  Is  now  defined  as 

G  •  .  (32) 


’’o.^oW 


where  P  -(y)  Is  the  joint  probability  density  of  function  of  y  ,  given  the 

dya^  2  *2  ** 

parameters  a  and  a  •  The  quantities  a  and  a  are  estimates  of 

2  “2  2 
8  and  <3  respectively,  while  Is  an  estimate  of  g  ,  given  that 

8  >  0  .  These  estimates  are  discussed  below. 


THE  ESTIMATE 


Under  the  assumption  that  a  »  0  ,  the  log  likelihood  function  Is  given 


log  Pq  ■  -  f-logEir  -  l-loga^  -  (33) 

’  2(j 

To  find  ag  we  differentiate  (33),  equate  the  derivative  to  zero  and  solve  for 
.  This  yields 

THE  ESTIMATES  a^,  8 

As  explained  In  the  previous  section,  the  likelihood  function  has  to  be 
modified  by  a  constraint  term.  In  order  to  avoid  unboundedness  of  8  when  Hq 
Is  true.  We  therefore  use. 


F 


I 


i 


log  P9.<,2tl)  -  pa'^9 


M  M  2  1  T  IT 

-  Y  log2ir  -  j  log  (j  -  — r  [ji  -  jnCa)]  [^  -  1(9)]  -  •jr  ue  e 

2a^  ” 


The  resulting  likelihood  equations  are 


^  [y  -  ™(e)]^[y  -  m(9)] 


(36a) 


M^(a)[y  -  jn(e)]  -  u  3  *  0 


(36b) 


When  first  order  approximates  are  assumed,  as  In  the  previous  section.  It  Is 
found  that 


ae  -  [m''’(9)M(9)  +  o^ulp]*^[M^(9)v  -  a^ua]. 


This  Is  the  same  expression  as  (13),  I.e.,  a  small  error  In  does  not 

A 

change  the  estimate  9  up  to  a  first  order. 


FIRST  ORDER  APPROXIMATION  OF  THE  LIKELIHOOD  RATIO 


The  statistic  G  defined  In  (32)  Is  given  by 


(2ira^)“^^^exp{-  I*}  .. 


It  Is  convenient  to  define  the  likelihood  ratio  as  the  following  monotone 
function  of  G: 


*2  *2  *2  *2  *2 
.  _  ,2/N  ,  _  «0  ,  ®0  ’  ^0  ' 

L2*G  -  - =2” 

u  a  ®0  ■  ^’0  ■  ®  ) 


where 


ffQ  ‘  “  (lV-  [JL-  E(9)]^[L-  E(8)]}/ff^ 


[• 


Using  the  derivations  in  the  previous  section  we  can  write 


Under  Hq: 


(!!.Vw 


^  '^k  2 

- 2~-2 - 2 - 

!!  ,'^k.2  r  ^kU|c  +  2a  u)  W,^  2 

k-V"'  'k^i  IlfrZF 


where  w  ,  A,  x  are  as  defined  in  section  2.  Mote  that  under  high  signal-to- 
•—  k  2  ? 

noise  conditions,  i.e.,  when  j %  «  x^  »  '*'® 

'"*k  2 

'■2  •  r^-  '«> 

k^r+f'^  ^ 

Thus,  at  high  SNR  L2  is  proportional  to  a  random  variable  with  a  central 
y  -  distribution. 
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Under  Hi 


{(w+a)^(w+3j  -  (Avj+^)^(Aw+b)}/a“ 

(^q_)  ^(w+q_)/a^  -  {(w+g)^  (w+q)-(Aw+^)^  (A^b)}/a^ 

w,  -t-q.  _  r  w, 

k.V^’ 


■y  v*«  '  *«  j*"^  * 


p;*? 
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where  w  ,  q_  ,  A,  ^  ,  Sj^,  Uj^  and  y  are  as  defined  in  section  2. 


Under  high  SNR  conditions  we  have:  ^  1,  uj^  »  q^^/a  ,  y  »  0  .  Thus, 


^  \  ‘’v  2 

*  T  wjj  q. 


(44) 


Thus,  at  high  SNR  I2  is  proportional  to  a  random  variable  with  a  noncentral 
^r  N-r  <l^sTribution  with  noncentral i^  parameter 


T  T 
q  q  mm 

^  “  “5”  * 

a  a 


(45) 


Note  also  that  for  high  SNR  and  N  -»-  •  we  have 


Hq  :  L2  ■  x^SNR) 


central  Chi -squared  with  r  degrees  of 
freedom 


(46) 


Hjc  L2  “  xr^SNR) 


non-central  Chi-squared  and  non-centrality 
parameter  SNR 


In  general  (when  the  SNR  is  not  sufficiently  large)  Lg  be  a 
monotonic  function  of  a  ratio  of  quadratic  forms 


'  a  fc 

Ua  ■  — 

^  1  +  L. 


(47) 


where 
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II 
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r 
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central  ratio  of  quadratic  forms 


ti 


2  2  2 

r  \|^(X|j+2<j  ii)  Wj^  2 
k-l  (X^  +  a^u)^  ^ 


=  4 


Ic-l 


u 


c 

(48) 


noncentral  ratio  of  quadratic 
forms. 
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4  THE  GLRT  IN  THE  CASE  OF  RATIONAL  SIGNAL 

Next  we  consider  the  case  where  the  signal  mla)  can  be  modeled  as  the 
impulse  response  of  a  rational  transfer  function,  i.e. 


l^b(2)  k-l.  ,  . 

2  dz  .  1  <  k  <  N  . 


(49) 


where 


.  n-1 

b(z)  _ 


+  b. 


,n  n-1 

2  +  aj2 


(50) 


The  polynomials  a(2)  and  b(2)  are  assumed  to  be  coprime;  a(2)  is  assumed  to  be 


r 

> 

‘'j 


stable.  The  parameter  vector  is  the  2n- dimensional  vector 

w' 

9  *  ^^1’  ^2* *  *  * ’^2* *  *  * ’^n^  * 

(51) 

In  order  to  comply  with  assumption  (iii),  we  assume  that 

)' 

1 

b(2)  »  0  ♦  a(2)  *  2",  i.e.,  if  all  {bj^}  {b|^}  are  zero,  then  all 

necessarily  zero. 

i\)  are 

'x 

i 

1^' 

First  we  show  that  assumption  (iv)  is  satisfied  for  the  model 

(49).  We 

V 

have 

^"k  1  _k-l. 

(52a) 

Hence 

(52b) 

i' 

•.1. 

v- 

y  b  2^”^ 

r  U  ^""k  1  ^  k-1. 

jii  "i  3I>/ 2,j  ^^-iTzT - " 

(53) 

i 

i 

SO  that  m(9)  is  a  linear  combination  of  the  columns  of  M(9)  ,  as 

requi red. 

1 

Next  let  us  compute  the  rank  of  M(9)  for  0»O  and  for  9*0.  For 
9*0  we  have: 


M(0)  « 

0  0  J  }  M-n 


p(0)  a  n  . 


(55) 


Next  we  show  that  r(8)  »  2n  for  all  9^0.  Assume  the  converse,  1.e.,  that 
for  some  nonzero  vector  ;  «  Cc^,C2,...,c„.  d^.  d2 . d^]^  we  have 


z"‘‘)b(2) 


a(z) 


n-iN 


' 


,  1  /^d(2)a{z)-c(z)b(z) 


(56) 


This  means  that 


d(z)a(z)  -  c(z)b(z)  *  0  (57) 

for  some  polynomials  {c(z),d(z)}  ,  contradicting  the  assumed  coprimeness  of 
{a(z),b(z)} 


Finally,  we  note  that  in  the  case  of  a  rational  signal,  the  likelihood 
ratio  under  Hq  is  given  by  (cf.  (24)  and  (55)) 


L 


_l+2o^ 

(i+o^u)^ 


n  w^  , 
k«l 


(58) 


Therefore,  under  Hg  ,  the  likelihood  ratio  is  proportional  to  a  random 
variable  whose  distribution  is  with  n  degress  of  freedom. 


r 


similarly,  under  L2  is  given  by 


, 2  nw 
l+2a  u  r  f 


I  (— }^+(l  -  o]  I 

k«n+r<^  ^  (1+ff^u)^  lc*r® 


1^2q^u 


(I+O^u)^  1  +  (1  -  2)F 

(1+0  u) 
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"  '^k  2 

I  (— ) 

^  fJ - kT" 

J 

k=n+l 

Thus,  under  is  afunction  of  a 

r,N-n  distribution.  Mote  that  when  <<  1 


4  =  ^ 


n,M-n 


Equations  (58),  (59)  can  be  used  to  compute  the  thresholds  of  the  detector  for 
a  specified  probability  of  false  alarm  . 

Under  the  GLRT  is  a  quadratic  form  (cf.  (28)-(31))  and  Is  a 
function  of  a  ratio  of  quadratic  forms  (cf.  (47),  (48)).  The  probability 
distribution  function  of  the  first  can  be  computed,  given  the  weights 
{gjj]  and  the  noncentralities  ,  using  one  of  the  methods  suggested  in 

[3].  The  probability  distribution  function  of  the  second  is  difficult  to 
compute  in  general.  In  the  case  of  high  SNR  approximately  an 

„  distribution,  cf.  (42),  (44); 
n,  M-n 


^0  ■  ’■2  '  '^,N-n 


,  central  F-dlstributlon 


’  4  ' ’  non-central  F-dlstribution 


^  'v»  'j*  ~ j' ' »**_  -*  €*  ■*-  •*' 


(62) 


5.  NUMERICAL  EXAMPLES 


In  this  section  we  illustrate  the  performance  of  GLRT  in  two  test 
cases.  In  both  cases  the  signal  is  modeled  by  the  second-order  transfer 
functi on 


b(2)  _ 


(63) 


2 

In  the  first  test  case  a(z)  »  z  -1.4z  +  0.95  ,  while  in  the  second  case 
2 

a(2)  *  2  -1.3z  +  0.75  .  The  parameter  bj  was  computed  according  to  the 
desired  signal  energy  in  each  example.  The  parameter  b2  is  identically  zero. 


As  Is  clear  from  the  previous  section,  the  GLRT  for  the  case  of  unknown 
variance  approaches  the  one  for  the  case  of  known  noise  variance  as  the  number 
of  data  N  becomes  sufficiently  large.  Therefore,  and  due  to  the  difficul'^  in 
computing  the  distribution  of  a  ratio  of  quadratic  forms,  we  settled  for 
testing  the  case  of  known  variance  only. 


Besides  the  GLRT,  we  examined  two  other  detectors  for  comparison.  The 
first  is  the  matched  filter  (MF) 

Lf^  *  (0)y  ,  (64) 

a 

where  m(9)  is  the  true  signal.  The  matched  filter  corresponds  to  a  situation 

when  the  signal  waveform  is  known,  and  represents  an  upper  bound  on  the 

performance  of  any  detector  for  the  detection  problem  (1).  The  distribution 

T  2 

of  Lmf  under  Hq  is  normal  with  zero  mean  and  variance  m  m/a  ,  while  the 

T*  2 

distribution  under  is  normal  with  both  mean  and  variance  equal  to  m  m/a  • 
The  second  detector  is  the  energy  detector  (ED) 

a 

The  energy  detector  does  not  make  any  assumption  on  the  signal,  hence  it 

represents  a  lower  bound  on  the  performance  of  any  reasonable  detector.  It  is 

2 

not  difficult  to  show  that  the  distribution  of  Lro  is  x.  '^■'Th  N  degrees  of 


IB 


■.V'- 


freedom.  Under  Ha  the  distribution  Is  central,  while  under  Hi  1t  Is 

T  2 

noncentral,  with  noncentral I'ty  parameter  m  m/a  • 

First  we  compute  the  theoretical  detection  probabilities  of  the  three 
detectors  as  a  function  of  the  signal  to  noise  ratio,  keeping  the  probability 
of  false  alarm  fixed.  For  each  detector,  the  corresponding  the  threshold 
was  computed  by 


t  »  F‘^(l-Pp^)  ,  (66) 

where  Fq(  )  denotes  the  cumulative  distribution  fuctlon  of  the  likelihood 

ratio  In  question  under  H„  (normal  for  MF,  for  glRT  and  for  ED).  The 

0  *^n 

detection  probability  was  then  computed  by 

Pjj  »  1  -  F^(t)  ,  (67) 

where  Fj^{  )  is  the  cumulative  distribution  function  of  the  corresponding 
likelihood  ratio  under  H^  .  The  quadratic  form  distribution  of  the  GLRT  was 
computed  by  numerical  evaluation  of  the  Inverse  Fourier  transform  of  the 
characteristic  function. 

Figure  1  shows  the  theoretical  detection  probabilities  of  the  three 

detectors  for  the  narrowband  case  as  a  function  of  the  SNR,  with 
-2 

Pp^  a  10  and  H  »  60.  Figure  2  <tep1cts  a  similar  case,  except  that 

Pp^  *  10“^  •  Figures  3  and  4  show  the  corresponding  results  for  the  medium 

band  case,  with  N  »  10.  The  SNR  Is  defined  here  as  the  ratio  of  the  total 

T  2 

signal  energy  to  the  noise  variance,  I.e.,  m  m/a  •  Note  that  the  actual  SNR 

T  2  - 

Is  m  m/(Ho  )  •  As  expected,  the  performance  of  the  GLRT  Is  between  those  of 

the  matched  filter  and  the  energy  detector.  In  the  narrowband  case,  the  GLRT 

is  much  better  than  the  ED,  while  In  the  medium  band  case  the  GLRT  Is 

approximately  In  the  middle  between  ED  and  MF. 


Next,  we  tested  the  behavior  of  the  GLRT  by  Monte  Carlo  simulations,  and 
compared  the  experimental  distributions  to  the  theoretical  ones.  In  each 
case,  1000  Monte-Carlo  simulations  were  run.  The  constraint  parameter  ^  was 
set  to  5  in  all  cases.  Figure  5  shows  the  theoretical  and  experimental 
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distributions  of  the  GLRT  under  Hq.  The  number  of  data  points  >1  was  50.  As  S 

we  see,  the  experimental  distribution  matches  the  theoretical  one  quite  well.  fia 


Figure  6  shows  the  theoretical  and  experimental  distributions  under 

for  the  narrowband  signal.  The  number  of  data  points  was  again  50  and  the  SNR 

was  13  dB.  Again,  the  two  distributions  are  fairly  close,  except  for  the 

"bump”  in  the  experimental  distribution  at  low  values  of  L.  Observing  the 

individual  Monte-Carlo  runs,  we  found  that  in  some  of  them  the  constraint  term 

* 

"pulled"  the  estimate  9  to  relatively  low  values.  This  phenomenon  is  not 
accounted  for  by  first-order  approximations  derived  in  section  2,  and  serves 
to  explain  the  difference  between  the  two  curves. 


E 

3? 


Figure  7  shows  the  two  distributions  under  for  the  medium  band  case. 
The  number  of  data  points  was  20  and  the  SNR  was  16  dB.  As  we  see,  the  "bump" 
near  the  origin  is  now  larger  than  in  the  previous  case.  This  means  that  the 
effect  of  the  constraint  terra  is  now  more  severe,  "pulling"  9  to  low  values 
more  often.  The  bump  causes  an  approximately  constant  difference  between  the 
two  curves  for  L  >  30. 
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5.  COHaUSIOMS 


We  presented  an  adaptive  scheme  for  detecting  transient  waveforms  of 

unknown  characteristics  In  white  Gaussian  noise.  The  detector  Is  based  on  a 

generalized  likelihood  ratio  test,  and  uses  a  constrained  maximum  likelihood 

estimation  of  the  signal  parameters.  Approximate  expressions  were  derived  for 

the  distributions  of  the  likelihood  ratio  under  Hq  and  under  Hj.  It  was  shown 

that  In  the  case  of  known  noise  variance  the  llkellhod  ratio  Is  distributed  as 

a  quadratic  form  with  the  number  of  degrees  of  freedom  equal  to  (or  less  than) 

the  number  of  unknown  parameters.  3y  comparison,  the  energy  detector  Is 
2 

distributed  as  a  ^  with  the  number  of  degrees  of  freedom  equal  to  the  number 
of  data  points.  Thus,  the  GLRT  performs  considerably  better  than  the  energy 
detector  In  cases  where  the  number  of  unknown  parameters  Is  much  smaller  than 
the  number  of  data  points.  In  the  case  of  unknown  noise  variance  the  Q.RT 
Involves  ratios  of  quadratic  forms. 

When  the  theoretical  distributions  were  compared  to  experimental  ones, 
some  discrepancy  was  observed.  This  discrepancy  Is  attributed  in  part  to  the 
effect  of  the  constraint  term  In  tt<e  oaxlmum  likelihood  estimator.  For 
narrowband  signals  the  discrepancy  Is  small,  while  for  medium  or  broadband 
signals  It  may  be  quite  large. 
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ABSTRACT 

The  paper  considers  the  asymptotic  accuracy  of  ARMA  parameter  estimation 
methods  based  on  a  fixed  number  of  sample  covariances.  A  general  expression 
for  the  error  covariance  of  the  ARMA  parameter  estimates  is  presented.  It  is 
shown  that  the  error  covariance  is  always  greater  than  a  certain  lower  bound, 
and  that  this  lower  bound  is  strictly  greater  than  the  Cramer-Rao  bound.  An 
explicit  ARMA  estimation  technique  that  asymptotically  achieves  the  bound  is 
presented.  Finally,  it  is  shown  that  this  lower  bound  approaches  the  Cramer- 
Rao  bound  as  the  number  of  sample  covariances  tends  to  Infinity. 


This  work  was  supported  by  the  Army  Research  Office  under  contract  No.  DAAG29 
83-C-0027. 


I.  INTRODUCTION 


The  problem  of  estimating  the  parameters  of  ARMA  processes  has  been 
treated  extensively  in  the  statistical  and  engineering  literature.  ARMA 
parameter  estimation  techniques  can  be  classified  Into  two  general 
categories:  methods  that  use  the  data  directly,  and  methods  that  apply  some 
preliminary  transformations  to  the  data.  Among  the  methods  In  the  first  class 
we  mention  In  particular  the  exact  maximum  likelihood  method,  and  Its  many 
approximations  Cl]-CA].  Typically,  such  approximations  are  aimed  at 
preserving  the  asymptotic  properties  of  the  maximum  likelihood  method,  namely 
consistency,  asymptotic  efficiency  and  asymptotic  normality,  while  reducing 
Its  computational  complexity.  Among  the  second  class,  probably  the  most 
common  approach  is  to  transform  the  data  Into  a  finite  set  of  sample 
covariances  and  then  estimate  the  .ARMA  parameters  from  these  sample 
covariances.  References  C5]-C73  include  examples  of  this  class  of  estimation 
techniques.  Most  of  the  system  identification  techniques  used  in  practice  are 
based,  either  explicitly  or  Implicitly,  on  sample  covariances. 

In  the  special  case  of  autoregressive  (AR)  processes,  methods  of  the 
second  class  are  known  to  be  asymptotically  equivalent  to  the  maximum 
likelihood  method  [8].  The  first  p^-l  sample  covariances  (where  p  is  the  order 
of  the  AR  process  in  question),  while  not  being  a  sufficient  statistic  for  the 
AR  parameters  [9],  are  known  to  yield  asymptotically  efficient  estimates  of 
the  parameters  via  the  Yule-Walker  equations  [8].  ARMA  parameter  estimation 
methods  based  on  sample  covariances  are  known  to  be  less  efficient  than 
maximum  likelihood  ARMA  estimates.  This  phenomenon  can  be  explained  as 
follows.  At  least  p+q+1  sample  covariances  are  needed  to  estimate  the  p+q+1 
components  of  e  .  However,  It  was  recently  shown  that  only  the  sample 
covariances  of  orders  0  <  t  <;  p-q  are  asymptotically  efficient  estimates  of 
the  corresponding  true  covariances,  while  sample  covariances  of  higher  orders 
are  not  asymptotically  efficient  [10].  Since  p+q+1  >  p-q+1  for  all  q  >  0, 
some  loss  of  efficiency  of  the  ARMA  estimates  based  on  the  sample  covariances 
Is  inevitable. 

The  discussion  above  naturally  raises  the  question:  what  is  the 
asymptotic  accuracy  of  ARMA  parameter  estimation  techniques  based  on  sample 


covariances?  Partial  answers  to  this  question  can  be  found  in  the 
literature.  The  accuracy  of  a  particular  estimation  technique,  the  so-called 
high  order  Yule-Walker  method,  was  considered  in  [13], [14].  Mote  that  these 
references  treat  only  the  accuracy  of  the  estimates  of  the  AR  part  of  the  ARMA 
parameters.  The  best  accuracy  achievable  by  any  estimator  based  on  sample 
covariances  was  studied  by  Bruzzone  and  Kaveh  [15]-[18].  They  defined  a 
scalar  measure  of  accuracy  and  computed  its  value  for  various  examples.  These 
results  verified  the  inefficiency  of  ARMA  estimates  based  on  a  finite  number 
of  sample  covariances. 

In  this  paper  we  present  a  fairly  complete  set  of  results  on  the 
asymptotic  accuracy  of  ARMA  parameter  estimation  techniques  based  on  sample 
covariances.  In  section  3  we  derive  general  asymptotic  expressions  for  the 
asymptotic  bias  and  covariance  of  a  general  class  of  ARMA  parameter  estimates 
based  on  sample  covariances.  In  section  4  we  briefly  review  previous  results 
on  the  accuracy  of  ARMA  parameter  estimates  based  on  a  finite  number  of  sample 
covariances,  and  present  new  proofs  of  these  results.  A  lower  bound  for  the 
error  covariance  matrix  is  presented  and  is  shown  to  be  strictly  larger  than 
the  Cramer-Rao  bound.  A  specific  estimator  which  achieves  this  bound  is  also 
presented.  In  section  5  we  prove  that  this  lower  bound  approaches  the  Cramer- 
Rao  bound  as  the  number  of  sample  covariances  tends  to  infinity.  This  result 
is  commonly  assumed  in  the  literature,  but  has  not  been  formally  proven. 
Finally,  we  illustrate  the  theoretical  results  by  some  numerical  examples. 

In  the  next  section  we  define  the  problem  under  consideration  and 
introduce  some  necessary  notation. 


2 .  PROBLEM  STATEMENT 


A  Gaussian  autoregressive  moving-average  (ARMA)  process  is  Hefine^l  by  the 
difference  equation 


ya-Yay  +U+ybU 

t  ic^t-k  t  k  t-k  ’ 


(1) 


where  {u^}  is  a  zero  mean  Gaussian  white  noise  with  variance  •  The 
pol ynomial s 


a(z)  =  1  +  a^z  +  .  .  .  +  apZ°;  biz)  =  1+b^z  +  ...  +  b^z*^  , 
are  required  to  satisfy  the  foUowino  conditions: 


(i)  a(z)  0,  biz)  *  0  for  all  |zl  <  1  ,  i.e.,  all  the  roots  of  these 
polynomials  are  outside  the  unit  circle; 

(ii)  a  *  0,  b  i*  0  ; 

P  <1 

(iii)  aiz)  and  biz)  are  relatively  prime,  i.e.,  they  have  no  common 
roots. 


Conditions  iii)  and  (iii)  imply  minimality  of  the  description  (1)  of  the  given 
process.  Under  these  conditions,  the  (p+q+1) -dimensional  parameter  vector 


9  »  Cc 


u’ 


a, 


‘p’  n’ 


comoletely  and  uniqely  determines  the  probability  distribution  of  the  process 
{y^}  .  We  will  denote  the  set  of  all  admissible  values  of  e  by  £  . 


The  covariances  of  {y^}  are  defined  by 

a  (n)  »  (j  (-n)  =  E(y  y^  }  ;  -«  <  n  <  »  .  (2) 

yy  yy  ^  t  t-n  > 

Let  S^(9)  denote  the  vector  of  M+1  consecutive  sample  covariances  of  the 
ARMA  process  whose  parameter  vector  is  9  ,  i.e. 


A 


Sj||(9)  »  CayytOU  ayy(l),...,ayy(M)]  . 


The  sample  covariance  corresponding  to  a  set  of  consecutive  measurements 
Cy^.  y2»  •••  y^}  defined  by 


0  <  n  <  M-1  .  • 


The  vector  of  sample  covariance  will  be  defined  similarly  to  S„(9)  ,  i.e. 

M  M 


^  ®yy^^^  *  *  * 


In  this  paper  v<e  will  consider  estimates  of  the  form 


9  =  g(Su)  , 


where  M  >  p+q  .  The  function  g(.)  is  assumed  to  satisfy  the  following 
regularity  conditions: 


(i)  g(.)  is  continuous,  with  continuous  partial  derivatives  up  to  a 

third  order; 


(ii)  g(S.J  is  a  consistent  estimate  of  9  .  As  is  well  known,  s,.  is  a 
M 


consistent  estimate  of  S„{9)  .  This,  and  the  continui'ty  of 

n 


g(0  clearly  imply  that 


g(S|^(9))  *  9 


,  for  all  9  £  9  . 


MW 


3.  THE  BIAS  AND  THE  COVARIANCE  OF  THE  ARMA  ESTIMATES 
In  this  section  we  derive  general  asymptotic  expressions  for  the  bias  and 

A 

the  covariance  of  ARMA  parameter  estimates  of  the  class  g(S|^)  defined  in  the 
previous  section.  We  first  recall  some  known  properties  of  the  vector  of 
sample  covariances  S^  [8,  Ch.  8  ]. 

A 

(i)  ^  unbiased, 

■  Sm  •  '3' 

(ii)  Asymptotically,  the  covariance  of  S^  is  given  by 

Cov{^}  =  7^(9)  +  0(n“M  .  (9) 

where  0(M’M  denotes  a  term  negligible  compared  to  .  The 
elements  of  given  by  Bartlett's  formula 

^  m*-*  ^  m*— 

An  explicit  expression  for  was  derived  in  [17]  for  the 

special  case  in  which  the  roots  of  a(z)  are  simple  and  appear  in 
complex  conjugate  pairs.  A  more  general  formula,  which  holds  for 
any  ARMA  process,  was  derived  in  [10]  and  is  given  in  Appendix  A 
for  completeness. 

(iii)  The  vector  /ir(S„-S„)  is  asymptotically  normal  with  zero  mean  and 

n  n 

covariance  matrix  • 

The  above  properties  of  and  g(Sj^)  imply  the  following. 

A 

Theorem  1:  Both  the  bias  and  the  covariance  of  9  are  asymptotically 
proportional  to  . 

*  -1 

Proof:  Let  o(N  )  denote  a  -andom  variable  such  that 


6 


(11) 


(12) 


lim  E{o(N"h)^}  =  0 

N-h. 

Then,  since  g(.)  has  continuous  partial  derivatives  up  to  a  third  order,  we 

A 

can  expand  its  k-th  component  91^(5^)  in  a  second  order  Taylor  series, 

h  “  3,j(Sf^)  =  9|((Sj,)  +  gj—  (Sf^-Sj^) 

2  ^ 

1  -  T  3  “  -  -1 

■*'  if  +  o(N  )  . 

3S„ 

Mow,  using  the  fact  that  g|j(Sj^)  =  and  that  is  unbiased,  we  get 

2 

■*  1  ■*  T  3  9|((Su)  ^  _i 

^i®kl  ■  ®k  “  ~  ^ 

2 

A 

2 

1  1  3  9^(5^)  , 

aiitr( — LJ!_7  (9)1  +  o(N-M  , 

N  2  .^2  ” 


(13) 


3S, 


M 


where  tr{  }  denotes  the  trace  operator.  This  proves  that  the  bias  of  e  is 
asymptotically  proportional  to  m“^  .  Next  let  us  denote  by  6(9)  the 
(p+q+1)  X  (M+1)  matrix  of  partial  derivatives  of  g(Sj^)  ,  expressed  as  a 
function  of  9  .  Then  we  get  from  (12), 


C0V{9}  *  E{ (9-9)(0-6)^} 

=  G{9)  E{(VS„)(S„-S„)''^}g'^(9)  +  0(N”M 
=  ^  G(9)  [„(9)  G^(8)  +  o(N"M.  (14) 

This  proves  that  the  covariance  of  9  is  asymptotically  proportional  to  N 


Theorem  1  provides  us  with  explicit  asymptotic  expressions  for  the  bias 

A 

and  the  covariance  of  9  (equations  (13)  and  (14)  respectively).  In 

A 

particular,  we  observe  that  C0V{9}  is  asymptotically  dependent  only  on  the 
parameters  of  the  given  process  and  on  the  Jacobian  of  g(*)  .  This  makes  eq 


(14)  a  useful  tool  in  analyzing  the  performance  of  ARMA  parameter  estimation 
algorithms  based  on  the  sample  covariances. 


The  bias  formula  will  not  be  needed  in  the  sequel.  Our  only  interest  in 
the  bias  behavior  is  for  justifying  the  use  of  the  Cramer-Rao  bound  for 
unbiased  estimators,  as  discussed  in  the  next  section. 


•  *’  •*'  n"* 


4.  A  LOWER  SOUND  ON  THE  COVARIANCE  OF  THE  ARMA  ESTIMATES 


In  this  section  we  review  briefly  some  previously  published  results 
concerning  ARMA  estimates  based  on  sample  covariance.  While  the  main  purpose 
of  this  section  is  to  serve  as  a  introduction  to  the  next  one,  we  have  found 
it  useful  to  provide  alternative  proofs  to  existing  ones,  due  to  reasons 
discussed  in  the  sequel.  Throughout  this  section,  the  inequal  i^  A  >  3  for 
matrices  A  and  8  means  that  A-3  is  positive  definite.  Similarly,  the 
ineouality  A  >  B  means  that  A-3  is  a  positive  semidefinite  matrix. 

Let 


S„  =  f(9)  (15) 

denote  the  functional  dependence  of  the  vector  of  covariances  S,,  on  the 

M 

parameter  vector  a  .  Let  "(a)  denote  the  (M+1)  x  (p+q+1)  matrix  of  partial 
derivatives  of  f(.)  .  Let  P(y,(a)  be  the  matrix 

P^,{9)  »  if  (a)  T-1(9)f(9)]'^  .  (16) 


Theorem  2 : 


G{9}  [„(9)  g'^(8)  >  P„(8)  .  (17) 

This  theorem  can  be  proven  by  observing  that  the  matrix  ?"^(9)  is,  except  for 
a  factor  N,  the  asymptotic  information  matrix  of  the  sample  covariances  -  see 
e.q.,  [17,  eq.  (21)].  A  direct  proof,  which  does  not  rely  on  the  asymptotic 
normality  of  the  sample  covariances,  is  given  as  follows.  The  consistency 
requirement  (7)  clearly  implies  that 

G(9)F(a)  »  I  ,  Va  e  0  (13) 

p+q+1  — 


•when  M+1  >  p+q+1  .  Also, 


M+1  {  ];„{0)  0 

p+q+l  {  0  0 


>  0 


(19) 


p+q+l 


Hence  we  have  (omitting  the  dependence  on  a  for  convenience) 


G  0 

Im  ^ 

gT  j-‘f 

SF 

0  0 

0  F 

F^G^'  F^[„-'f 

T 

>  0  . 


(20) 


Therefore, 


‘p.q.i 

^ P+q+l 

^ p+q+l  ° 

°  ^ P+q+l 

J P+q+l 

pTr-lr 

>  0  , 


(21) 


and  finally 


G(0)[„(9)g''’(9)  >  CF^{0)J-1(9)F(9)]‘^  .  (22) 

By  theorem  2,  N”^Pj.(9)  is  an  asymptotic  lower  bound  on  the  covariance  of  any 
estimate  e  =  g(Su)  •  The  closed-form  expressions  for  V..(9)  and  F(0)  given 
in  appendix  A  enable  the  computation  of  this  bound  as  a  function  of  the  ARMA 
process  parameters  9  ,  without  computing  the  roots  of  a(z)  as  was  required  in 
[17]. 


10 


t 


4 

! 


We  now  turn  our  attention  to  the  relationship  between  the  bound  given  in 
(17)  and  the  Cramer-Rao  bound  (CRB).  For  a  biased  estimate  9  with  a  bias 
term  b(9)  the  CRB  is  aiven  by  [11,  Ch.  4] 


where  I„(9)  Is  the  Fisher  information  matrix  corresponding  to  9  (assuming  N 
^  ^  b  ( 3 1 

measurements)  and  ■  is  the  Jacobian  matrix  of  b(9)  .  From  theorem  1  we 

do  « 

know  that  b(9)  is  asymptotically  proportional  to  .  Hence  the  same  is 
true  for  .  Also,  is  known  to  be  asymptotically  proportional  to 

H  [1].  Therefore,  the  right-hand  side  of  (23)  differs  from  l"^(8)  by  a  term 

-1  ^ 
o(N  )  ,  and  we  can  asymptotically  replace  (23)  by 


C0V{9}  +  0(N"h  >  I,J^(9)  »  CRB{9}  .  .  (24) 

This  argument  justifies  the  comparison  of  the  bound  Pn^(9)  to  rather 

than  to  the  bound  given  in  (23)  In  the  asymptotic  case.  The  next  theorem 
asserts  that,  for  every  finite  M,  Pn,(9)  is  strictly  greater  than  the  CRB. 


Theorem  3: 


Pj^(9)  >  N  l‘^(9) 


(25) 


This  theorem  follows  as  a  special  case  of  a  more  general  theorem  -  see  Theorem 
5  and  the  corollory  preceding  it  In  [16].  An  alternative  proof  follows  from 
Theorem  4  In  [10].  There  is  was  shown  that  for  all  M  >  p-q. 


[,^(9)  >  M  F(9)I’^(9)F^(9)  . 


(26) 


Therefore  we  have  (omitting  the  dependence  on  a  for  convenience), 


11 


I LP  .in 


V .'  injtwnw  I  w 


%1 

‘O 


ts 


s 

i 

I 


rr* 

r-.' 


fs 

iv 

r->’ 

I 

« 


Hence 


[F^V)'^  >  M 


(28) 


Note  that  the  condition  M  >  p-q  imolies  M  >  p+o  for  all  n  >  1  .  The  only 
exception  is  q=n,  i.e.,  the  case  of  a  pure  AR  process.  Indeed,  for  AR 
processes  it  can  be  shown  (using  the  formulas  given  in  [10])  that 

(pTy-lppl  ^  ^j-l  (29) 

“P  N 

i.e.,  estimates  of  the  AR  parameters  based  on  the  sample  covariances 
Sp  (e.g.,  the  Yule-Walker  estimate)  can  be  asymptotically  efficient. 

It  is  not  difficult  to  show  that  the  bound  PTa)  is  tight,  i.e.,  there 
exists  an  estimate  9{S,.)  that  asymptotically  achieves  this  bound.  To  show 

n 

this,  let  us  define 


V(X,S,^)  *  CS,^(x)  -  Zf;^(x)[S^(x)  -  S,^]  ,  X  e  3  . 


(301 


We  define  a(S,,)  to  be  the  value  of  x  e  £  for  which  v(x.S.^)  attains  a 
global  minimum.  The  estimate  e  satisfies  the  consistency  condition  (7)  since 
clearly 


V(d,  S.  (9))  =  0 

n 


(31) 


i.e.,  the  true  parameter  e  is  a  global  minimizer  of  V  when  the  sample 
covariances  S^,  are  replaced  by  the  true  covariances  S..(9)  .  Also, 

A  ”  n 

V(x,S..)  is  a  rational  function  of  x  and  7„(x)  is  nonsigular  for  all 
X  e  £  .*  Hence  the  partial  derivatives  of  g(.)  exist  and  are  continuous 
to  any  order.  The  following  theorem  asserts  that  the  estimate  9  defined 
above  achieves  the  bound  P„(9). 


tMote  that  7^(x)  is  a  rational  function  of  x,  as  shown  in  the  appendix. 


12 


.1 

■-S 
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Theorem  4:  Asymptotic  covariance  matrix  of  the  estimate  9  defined  above  is 
given  by 


1  im  N  Cov{9}  =  Pm(9 )  .  (32) 

N— 

Proof : 


By  eq.  (14),  we  only  have  to  show  that  the  Jacobian  G(3)  yields  equality 
in  (17).  The  first  step  is  to  show  that  the  Jacobian  is  given  by 


(33) 


The  next  step  is  to  evaluate  the  two  terms  at  the  right-hand  side  of  (33). 
This  yields 


(34a) 

(34b) 

G(9)  »  [F^(9)[’^(9)F(9)]'^f''’(9)[;5^(9)  .  (34c) 

Finally,  when  G(9)  is  substituted  into  (14),  we  obtain  the  stated  equality 
(32).  See  appendix  B  for  a  mor^  complete  proof. 


An  algorithm  of  the  type  discussed  here  was  given  in  [20],  and  is  closely 
related  to  the  one  proposed  by  Walker  [5].  This  algorithm  requires  a 
considerable  amount  of  computations,  due  to  the  need  to  invert  iji/|(0)  at  each 
iteration.  Thus,  it  is  not  necessarily  recommended  for  practical 
appl ications. 


5.  THE  LIMITING  BEHAVIOR  OF  P^{q) 

M 

So  far  we  have  restricted  our  discussion  to  estimates  based  on  a  fixed 
number  of  sample  covariances.  In  this  section  we  examine  the  limiting 
behavior  of  the  lower  bound  P|v|(9)  the  number  of  sample  covariances  M  goes 
to  infinity. 

We  will  show  that  the  limit  of  Pu(9)  is  equal  to  the  asymptotic 

^  n 

Cramer-Rao  bound.  Therefore,  the  relative  asymptotic  efficiency  of  estimates 

A  A 

of  the  form  0  =  g(Sj^)  can  be  made  arbitrarily  close  to  unity  by  increasing 
the  number  of  sample  covariances  and  by  using  them  in  an  optimal  manner  (e.g., 
as  in  the  estimate  of  Theorem  4). 


Let  us  denote 

I (a)  =  lin  n”^  Iu(d)  ,  (35) 

N^ 

i.e,,  1(0)  is  the  limiting  information  matrix,  normalized  by  the  number  of 

data  points.  Let  denote  the  power  spectral  density  of  the  given 

process,  i.e., 


=  )  a  (m)  cos  mjo  .  . 


(36) 


For  ARMA  processes,  the  power  spectral  density  is  an  absolutely  continuous 
function  of  u)  •  Also,  since  b(z)  was  assumed  to  be  nonzero  on  the  unit 
circle,  .^(oi)  is  strictly  positive  for  all  ^  .  Therefore,  the  elements  of 
I (9)  can  be  expressed  by  Whittle's  asymptotic  formula  [20] 


'  1  <  k,i  .  p+q+l  .  (37) 

/(o.)  ^®k  30^ 

Using  this  formula,  we  will  now  prove  the  following  theorem. 


Theorem  5: 
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(38) 
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Proof: 


Let  L2C-it,iT]  be  the  Hilbert  space  of  Lebesgue  measurable  functions  on 
C-ir.ir]  whose  square-magnitudes  are  Lebesgue  integrable.  Let  H  denote  tne 

subspace  of  L2C-ir,ir]  consisting  of  all  real  even  functions.  The  inner 
product  of  two  members  of  H  is  given  by 


^h^{(;ii),  h2(aj)>  =  J  h^ (u) doi  . 


(38) 


For  ARfiA  processes,  the  spectral  density  *(0))  satisfies 


(39) 


The  functions  j,(u),  H--- 


def i ne 


jiTti)')'  *  belong  to  H.  Let  us 


V|j  (u) ) 


j>(M)COSlC,,i 


/n 


ic  —  0,1, 2|»*.  # 


(40) 


The  sequence  CV|^(u),  k  >  0}  spans  the  space  H.  To  see  this,  suppose  that 


there  exists  h((u)eH  such  that 


|’'h(a,)  .  do,  =  0,  V. 

/7  * 


(41) 


-K 


Since  (^(ai)  is  bounded,  h(u)(t,(u)eH  .  Furthermore,  the  sequence 
{coskoi,  k  >  0}  is  known  to  be  complete  in  H.  Hence  ,))((,>)h(u))  =  0  ,  and  since 
|(a»)  is  srictly  positive,  h((o)  =  0  .  This  proves  that  k  >  0}  spans 


H. 


Let  {U|^(uj),  k  >0}  be  the  sequence  obtained  by  Gram-Schmi dt 


orthonormalization  of  {V|^(u),  k  >  0}  .  From  the  discussion  above  it  follows 


that  {U|^(u),  k  >0}  is  a  complete  orthonormal  sequence.  The  two  sequences 


are  known  to  be  related  via 


Uq  (ill) 


(ui) 


Uj^(a)) 


Vg{.) 


Vj  (u) 


Vf^{<.) 


V  M  >  0 


(42) 
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where  is  the  Gramian  matrix  of  0  <  k  <  M}  and  is  its  lower 

triangular  square-root.  Recall  that  the  Gramian  matrix  is  given  by 


'Vk.t*  • 

Next  note  that  (see  e.g.  [8,  p.  465,  Theorem  8.3.3]) 


=*  N.COV{ayy(k),  (Jyy(t)} 


=  —  /  ^^(<i»)COSk(uCOS)lud(u  =  /Vj^{(u)V^(u>)doj  =  <V|j(a»),  ''|^(<*»)^  • 

^  -IT  -ir 

Hence  the  Gramian  in  (42)  is  just  S^^s)  »  i*®*» 


W„  =  Z„(9)  .  M  >  0.  («) 

Let  us  now  turn  our  attention  to  the  entries  of  the  matrix  F.  vie  have 


^  /'  2il^cosn.d„ 

k  k  -ir  -IT  K' 


=  r’^r _ I -  ,  ,  <a(u)),  V  (u>)> 

J  /  \  36t-  •*  ^ 


-IT  2i/it  i)(a») 


'k  /if 


where 


Si  mil  ary, 


3(t  (u)) 


2/ir  <^(u>) 


3a  ,(n) 
i 


where 


2/it  ^((1)) 


Using  (16),  (42),  (45),  (46)  and  (48)  we  see  that 


M  M 


^  ^  <a(a»),V  (u,)>Ce'M0)]_  .  <0 (oi)  ,V„(a,)> 

''**  ii^O  n=»0  "  ’  ^ 


•  Z  <a((D)»U  ((d)><6(u)»U  («)>  *  <au(o»)  ,Su(u))]  • 
m=0  "*  m  H  M 


where  and  are,  respectively,  the  projections  of  aiu)  and  3(a))  on 


the  subspace  spanned  by  {^^((0),  0  <  m  <  M}. 


By  Whittle's  formula  (37)  we  have 


I.  Ad)  =  n - - •  iiiiilT  .  r - 1 

^•4  _  ^0/  .1  \  39t.  ^  y"“ 


Thus,  it  only  remains  to  show  that 


2/ii 


^^]du,  =  <aU),  3(a.)>  (51) 


lim  <ajy|((jj),  3j^((»))>  ®  <a(«) ,  3(!»>)^  •  (52) 

M-»«» 

By  the  completeness  of  {u(uj),  m  >  0}  we  have  (omitting  the  dependence  on 
u»  for  convenience) 

3f(|>  -  <a,  I  <  I<q^»3ik|>  *  3^1 

+  l<ajvj,  3^  ”  ^a,  3^1  *  ^  ^  8^1 


<  Ian,l«l3m-3>l  +  Ia|^-al>l31 


»>  0  • 


This  completes  the  proof  of  the  theorem. 

Remaric:  It  is  clear  from  the  proof  that  the  theorem  is  not  restricted  to  ARMA 
processes.  In  fact,  the  following  conditions  are  sufficient  for  the  theorem 
to  hold. 

(i)  The  process  power  spectral  density  ,^(aj)  of  the  process  satisfies 
(39). 


6.  SOME  NUMERICAL  EXAMPLES 


In  this  section  we  illustrate  the  behavior  of  the  bound  (16)  as  a 

function  of  the  number  of  sample  covariances  M,  by  two  examples.  In  both 

examples  the  ARMA  processes  are  of  order  (2,2),  with  a  pair  of  complex  zeroes 

and  a  pair  of  complex  poles,  the  zeroes  are  at  angles  ±45*  with  respect  to 

the  positive  real  axis,  and  the  poles  are  at  angles  ±135".  The  absolute 

-1/2 

values  of  both  the  zeroes  and  the  poles  are  (0.5)  in  the  first  example 
-1/2 

and  (0.9)  in  the  second.  The  corresponding  polynomials  are 


b(z)  _  l-z+0.5z^ 


rfzT 


1+Z+0.5Z 
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(first  example)  (54a) 


b(z)  . 

TTrr  ■ 


l-1.273z-K).9z 

1+1.273Z+0.9Z 


2 

7 


(second  example)  (54b) 


In  the  first  example  we  computed  the  bound  up  to  Ms20.  The  bounds  on  the 

A  A  A  A 

Standard  deviations  of  a^,  ag,  bj^,  \>2  are  shown  in  figures  la,  lb,  Ic,  Id 
respectively.  Also  shown  in  the  figures  are  the  respective  Cramer-Rao  bounds 
(standard  deviations)  on  the  parameters  (the  horizontal  lines).  As  we  see, 
for  M  f  12  the  bound  is  practically  indistinguishable  from  the  CRB.  For  M=4, 
the  minimum  possible  number,  any  ARMA  method  based  on  sample  covariances  would 
be  quite  inefficient. 


In  the  second  example  we  computed  the  bound  up  to  M^sq.  The  results  are 
shown  in  figures  2a,  2b,  2c,  2d.  As  we  see,  the  effect  of  moving  the  poles 
towards  the  unit  circle  is  to  enhance  ttie  relative  efficiency  of  the  AR 
coefficients.  On  the  other  hand,  moving  the  zeroes  towards  the  unit  circle 
results  in  a  slow  decrease  of  the  bound  on  the  MA  coefficients.  In  this 
example,  even  at  M=50  there  is  still  a  considerable  gap  between  the  bound  in 

A  ^ 

bj  and  b2  and  the  respective  CRB's. 
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7.  CONCLUSION 


In  this  paper  we  presented  expressions  for  the  asymptotic  accuracy  of 
ARMA  parameter  estimation  techniques  based  on  a  finite  number  of  sample 
covariances.  The  error  covariance  matrix  of  any  estimation  technique  of  this 
class  is  bounded  from  below  by  a  bound  which  is  strictly  larger  than  the 
Cramer-Rao  bound.  Furthermore,  this  lower  bound  is  tight:  it  can  be  achieved 
by  using  the  specific  ARMA  estimation  method  given  in  section  4.  It  was  also 
shown  that  this  lower  bound  approaches  the  Cramer-Rao  bound  as  the  number  of 
sample  covariances  tends  to  infinity. 

Finally  we  remark  that  the  results  presented  in  sections  3  and  4  can  be 
easily  generalized  to  the  situatiwi  where  only  a  subset  of 

^  A 

{Oyyf0).-**.Oyy(M)}  is  used.  All  that  needs  to  be  done  is  to  delete 
appropriate  rows  and  columns  from  the  matrices  G(8)  and  F(9)  .  The 

variance  expressions  remain  otherwise  unchanged.  However,  the  results  of 
section  S  will  no  longer  be  valid:  discarding  some  of  the  sample  covariance 
will  generally  cause  loss  of  efficiency.  See  [15],Cl7],CLal  for  the  case 
where  the  set  of  sample  covariances  starts  at  the  k-th  lag,  rather  than  at  the 
zero-th  lag. 
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APPENDIX  A:  EXPLICIT  EXPRESSIONS  FOR  F(0)  AND  z..(9) 


In  this  appendix  we  quote,  without  proofs,  some  results  derived  in 
[10].  Let  us  first  introduce  some  notations  as  follows.  Let 

and  {r  (z)}  be  the  coefficients  in  the  Laurent  series 

yy 


a(z)a(z  )  i*— 


a{z)b{z  )  i 


7  r  ( 2. )  2*  * 


b(z)b(z"M  _  “  -  -1 

- TT - I  • 

a(z)a(z  )  i=— 


Let  r’’'^(z),  Rxw'^(a)  and  be  the  ixj  Toeplitz  matrices 

iR^'’'^(i)l_  =  r  (z-ffl+n)  ;  i  R^' (z)l  =  r  (z-m+n).  ; 

I  XX  im,n  XX  i  xw  'm,n  xw 


'"yy  •  '■yy'2-"""'- 


Let  A  and  3  be  the  pxp  and  qxo  companion  matrices 


;  m=l,n=k 

m*l,  n*k 

1 

;  m=n+l 

m®n+l 

(A3) 

m,n  1  g 

;  otherwi se 

m,n  1 

otherwi  se 

Let  H  be  the  (M+q)  x  (M+q)  Hankel  matrix 


I  ?  -  VI, 

'm,n  k 
0 


m+n  =  M+q+1 
m+n  =  H+q+k+l 
otherwise 


1  <  k  (  p 


Let  J  be  the  (M+l)x(M+l)  matrix 

i  1  ;  m+n  =  M+2 

{J} 

^  J  m  n  f 

'  ’  *  0  ;  otherwi se 


m 


-  ^  V 


Let  e^  be  an  i -dimensional  unit  vector  with  1  in  the  first  position  and  zeros 
el sewhere. 


Finally,  let  K  be  the  (M+n)x(M+l)  matrix  defined  by 


(A6) 


Then  we  have  the  followino  results. 


Lemma  1:  The  matrices  and  r'’’^(0)  satisfy  the  matrix  Lyapunov 

—  XX  xw 

equations 


P’P(O) 
XX  ' 

{A71 

XW  ' 

(AS) 

These  Lyapunov  equations  admit  rational  closed-form  solutions  as 
exolained,  e.g.,  in  [12]. 


Lemma  2:  computed  for  al  1  z  >  o  using  the 

solutions  to  (A7),  (A8),  and  the  recursions 


0  0 

r  (i)  =  I  a,  r  (z-k)  ;  r  (z)  *  -  I  a,  r  (z-k)  ;  z  >  p  . 
'  k  XX  xw  k  xw 


XX 


{A9) 


Hence,  the  matrices  R^^'*(z)  and  Rxw^U)  can  be  computed  for  any  desired 
val  ues  of  i  ,j  and  z  . 


Lemma  3:  Denote  by  the  matrix  of  partial  derivatives  of  with  respect 

to  (a, ,  a.,  . . .  a  }  ,  and  similarly  for  7.S,.  .  Then 
i  2  p  b  M 


2  T  M+q,p 
®u^  Sx 


(0) 


(AlO) 


rr  C 


f  n  1  i 


Then  the  matrix  Ffa)  is  given  by 


APPENDIX  3:  PROOF  OF  THEOREM  A 


It  is  sufficient  to  evaluate  G(a)  and  then  compute  the  cor'^esDonding 
value  of  the  right-hand  side  of  (14).  Since  a  is  a  global  minimi zer  of 
V(x,S), 


3V(x,Sf^) 

3X 


A 

X=9 


*  0  . 


(Bl) 


Let  us  now  perturb  s.,  by  a  differential  amount  dS„  ,  snd  let  e  -r  de  be  the 

1-1  .  .  M 

global  minimi  zer  of  V(x,S^  +  dSj^,)  .  Then 


3V(x,S^ 


ax 


.  .  =  0 
X=9+d9 


3V(x,S^+dS»;,l 


3X 


3V(X,S„) 


x=9+d9 


3X 


XS9 


3  V(x,S, 


3x 


2 


.  de  +  — r 

X=9  3X3S 


3  V(x,S,^) 


M 


.  dS, 


X=9 


Using  (Bl)  and  (B2)  we  get 


da  =  - 


3^V(x,S^) 

-1 

3^V(x,S^) 

3x^ 

x=9 

3X3S^ 

- 1 

< 

II 

K 

dS, 


(B2) 


(33) 


(34) 


Therefore,  the  matrix  of  partial  derivatives  of  g(.)  is  given  by 


3^V{x,Sv,) 

-1 

3^V(x,i^) 

3x2 

X=9 

3X3S,, 

X»6 

(35) 
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wext  we  evaluate  the  partial  derivatives  appearing  in  (B5) 


3V 


It  -  -  Sm]  * 


3S.Jx) 


.rs„(xl-i„l^J-l(xl  |^J-l,x);s,(x)-S„] 


1^-  [^1IJ‘<*'(S„U..S,] 

<  I  k  1 

3S  (x)  3V  (x) 

“C~3x7  1  Im  “ix  Im 

K  i 


T  1 


■  “‘'k 


3  S,.  ( X )  .»  I  3  Sij  ( > 


«  T  1  3Sy(x)  .  3Su{x) 

1m  (x)™~  (x)r--i_i  +[Sj,(x)-S^]^J‘^(x)[ 


3S„(x)  _  ^  1  « 

3X  ]  1m  1x7  1m 


«  T  -1  1  1 

*  iM  (X)  If'l  Txy  Im 

2„ 

-[S,(x)-S;]V/(x)i^^-I(x)[S,(x)-s;] 

k  I 

■*  T  -I  1  1 

i„  (X)  (x)  -52_j-1(,,[s^(,,.s  j 


Recall  now  that  in  (14),  G(9)  has  to  evaluated  at  the  true  val ues  of 
d  and  S,^  ,  i.e.  at  9  =  9  and  =  S^(9)  •  Substituting  in  (B7)  and  (B8) 
see  that  most  of  the  terms  vanish  and  we  get 


3X' 


3S„(x) 


-  r-i,,xr  n  , 

>■  3X  ix=9  ®  C  3X  JX=9 


3Sj^(x) 


x=9 

=  2F^(9)  ^'^(alFie) 


(39) 


3^v 


3xaS,. 


X=9 


3Sti(x)  T  1 

=  -2'-if-|x.6  Im  '«> 


-2F^(9)y‘/(9) 


(BIO) 


Finally  we  get, 


8(si . 


(311) 


and 


■  <f^i«'‘f)'‘(f^j;;V)(f’'j-V)-i 

=  [F^(9)j;’^(9)F(9)]"^ 


(312) 
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Figure  2: 


Sounds  on  the  ARMA  Faratnetars 


a)  a, ,  b)  3,,  c) 


d)  b. 


(d) 
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ABSTRACT 


An  explicit  expression  is  derived  for  the  Cramer-Rao  bound  on  unbiased 
estimates  of  the  parameters  of  autoregressive  processes,  given  a  finite  number 
of  measurements.  The  expression  converges  to  the  well-known  asymptotic  form 
of  the  CRB  when  the  number  of  measurements  tends  to  infinity.  The  behavior  of 
the  bound  is  ilustrated  by  some  numerical  examples. 


This  '/«rk  was  supported  by  the  Army  Research  Office  under  contract  no.  0AAG29- 
83-C-0027. 


1.  INTRODUCTION 


Autoregressive  (AR)  modeling  techniques  are  widely  used  for  spectral 
analysis,  estimation/prediction  of  stationary  time  series,  and  adaptive 
filtering.  Numerous  algorithms  have  been  developed  for  fitting  AR  models  to 
data.  To  evaluate  the  performance  of  AR  techniques  in  different  applications 
it  is  often  necessary  to  evaluate  the  accuracy  of  the  AR  parameter  estimates 
obtained  from  a  given  amount  of  data.  Asymptotic  analysis  of  AR  parameter 
estimation  accuracy  is  a  relatively  easy  task,  since  the  AR  model  is  Just  a 
special  case  of  linear  regression,  except  for  the  initial  transient.  In 
particular,  the  asymptotic  Cramer-Rao  Bound  (CRB)  on  any  unbiased  estimate  of 
the  AR  parameters  is  well  known  [1]:  it  is  just  the  inverse  of  the  covariance 
matrix  of  the  process,  divided  by  the  number  of  measurements. 

For  short  data  records,  the  actual  CRB  differs  from  the  asymptotic 
expression,  due  to  the  initial  transient  of  the  linear  regression.  In  this 
note  we  derive  an  explicit  expression  for  the  Fisher  information  matrix 
associated  with  a  finite  number  of  measurements  of  a  Gaussian  AR  process.  The 
information  matrix  is  shown  to  be  the  sum  of  a  constant  matrix  and  the  matrix 
appearing  in  the  asymptotic  approximation.  The  CRB  is  then  given  as  the 
inverse  of  the  information  matrix.  It  is  shown  that  the  exact  CRB  can  be 
either  larger  or  smaller  than  its  asymptotic  approximation,  depending  on  the 
process  bandwidth.  In  particular,  narrowband  processes  tend  to  have  CRB's 
which  are  considerably  smaller  than  the  respective  asymptotic  approximations. 

In  the  next  section  we  derive  the  formulas  for  the  exact  CRB,  and  in 
section  3  we  provide  some  numerical  examples  illustrating  the  behavior  of  this 
bound . 

It  should  be  noted  that  the  CRB  is  not  necessarily  a  tight  bound  for  a 
finite  numer  of  data  points.  Thus,  the  best  achievable  parameter  estimation 
accuracy  may  be  considerably  poorer  than  predicted  by  the  CRB  when  the  number 
of  data  points  is  small. 


2.  DERIVATION  OF  THE  EXACT  CRB 


Let  {y^}  be  a  n-th  order  stationary  Gaussian  AR  process,  defined  via  the 
difference  equation 


n  ■  -Jj  *  “t 

where  {u^}  is  a  stationary  zero-mean  Gaussian  white  noise  with  variance 
<3  .  Let  us  denote  the  process  covariances  by 

'‘k  ’  '■-k  ’  ^{^t  ^t-k^’  -  <  k  <  -  .  (2) 

The  covariances  are  known  to  satisfy  the  so-called  Yule-Walker  equations 


n 

r  +  y  a,  r  , 
X  k  X-k 


<J 

0 


-1*0 
Jl  >  0 


Let  R„  denote  the  N  x  N  symmetric  Toeplitz  matrix  (where  N  >  n  ) 


(S^i.j  =*  ‘'i-j  ’  1  < 


(3) 


(4) 


Also  denote 


S,  .  R-1  .  (5) 

The  following  three  lemmas  will  be  needed  to  derive  the  main  result. 

2  -1 

Lemma  I;  The  matrix  (a  )  ‘Rj^j  admits  the  Cholesky  (lower/upper)  factorization 

{<?)  ^Rj^  »  Ljj^  Lj^  ,  (6) 


where 


and  l“^  is  the  lower  Cholesky  factor  of  R  . 
n  n 

2  -1 

The  lemma  is  proven  by  computing  (0  )  Lj^jRi^j  and  using  eq.  (3).  The 
result  turns  out  to  be  an  upper  triangular  matrix  with  I's  along  the  diagonal 
elements  starting  at  the  (n+l,n+l)  position.  Equation  (6)  then  follows  from 
the  uniqueness  of  the  Chloesky  decomposition. 


Lemma  2:  The  inverse  covariance  matrix  S^(  ■  R*^)  is  given  by  the  expression 


(a^)'^(A^  a[  -  A2  aJ)  , 


(8) 


where  A^  and  A^  are  the  lower  triangular  Toeplitz  matrices 


‘f-j 


i  >  j 
i  <  j 


a 


^Vi.j 


n-1+j 

0 


1  >  j 
i  <  j 


(9) 


This  Is  the  so-called  Gohberg-Semencul  formula,  proven  e.g.,  in  [2], 


Lenwa  3:  Let  Y  be  a  zero-mean  Gaussian  vector  whose  covariance  matrix 
A  depends  on  a  vector  of  parameters  9  of  dimension  m.  Then  the  mxm  Fisher 
information  matrix  of  Y  is  given  by 

k  X 

where  tr{.}  denotes  the  trace  operator. 


This  formula  can  be  obtained  by  direct  computation,  or  see  e.g.,  [3]. 


We  now  state  and  prove  the  main  result,  as  follows: 


Theorem  1:  The  Fisher  information  matrix  corresponding  to  N  consecutive 
measurements  of  the  given  AR  process  (where  N  >  n),  is  given  by  the  exact 
expession 


where 


Jf,  *  J  +  (N-n) 


(2/) 

0 


_  In 

J  is  a  constant  matrix  whose  elements  are  given  by 


(11) 


^^1,1  ’ 

J^-l. 


as. 


Proof :  ay  (Sl/S)  we  have 


Hence 


Sjj  «  (a  )  Lj,  4  • 


.  ,  4,-1,  T  , 

— 2-  •  -(<T  )  Lf,  ^ 
bo  j 

(  2,-1  ^  (  2,-1  ,T 

^  ^  '-N  *  '  '-N  bi;^ 


bSjj 

a*,. 


bo 


bS, 


N 


_ R  .  il2  L-^  J  ^  ch-'^ 

ba,^  aa^  '■M 


"•N  ba^  '•H  '-N  ■ 


(12a) 

(12b) 

(12c) 


(13) 


(14) 

(15) 

(16) 

(17) 


After  some  calculations  (using  the  commutativity  of  the  trace  operator  and  its 
invariance  under  transposition)  we  get: 


4 


7  tr(— j  — 2-  Rf,}  »  {2a  )  N 

do  do 


1  dSiy  dS«  j  1  1  dLu 

7  tr(— j  Rf, - R^}  »  Aa)  ^  } 

do  dav  < 


1  dS„ 

7  ^’^tda"  da“  * 

k  jl 


da^  “-N  da^  f  >  ^’^Ua,^  aa^^ 

Now,  the  partial  derivatives  of  are  given  by 


where  Z.  is  the  down  shift  matrix 
m 


1 1  ;  i  -  j  »  m 

(Z  )  » 

|o  ;  otherwise 


Since  both  L„  and  its  partial  derivatives  are  lower  triangular,  and  since  the 
last  N-n  diagonal  entries  of  aL|^/da|^  are  zero,  we  get 


-l^X-N 


1 


‘  IT  J 


(23) 


Also, 


da^  da  "H 


aJ  dL„  'aJ  aL„ 

aaj^  aa^  n,  aa^  aa^  '^12 

------I------ 


*  ^n-lc 


Hence 


all  aLj,  bil  aL„ 

•  tr{^^R  }  +  (N-n)r. 
^aa,^  aa^  ^aaj^  aa^  '  k- 


Substituting  (24),  (25)  in  (20)  yields 


itrf^R  ^R  1 

T  '*34^  "»  as, 


5^;  aL„ 


■  '^I^“asr  Is:  I  *  '^(asT^  M 


Finally  we  see  that  (10)  and  (18)  yield  (12a);  (10),  (19)  and  (23)  yield 
(12b);  and  (10),  (20)  and  (27)  yield  (12c). 


Corollary;  The  exact  Cramer-Rao  bound  on  any  unbiased  estimate  of  the  AR 
parameters  is  given  by  ,  where  is  given  by  (11),  (12). 


Comments; 


(1)  Note  that  formulas  (11),  (12)  are  actually  closed-form  expressions 
for  .  R^  is  given  by  (cf.  (8)) 


tVitilV  wjfV^  ^  .<  oV-\  4.T*Ln  << 'v  .I  ‘  •Iv  - 


W  •  1,  '  •  *  -  '  V^.  .  *  %  • 


9„  •  -  l^l^y 


while  -TT-  is  given  by 


|l.  (»^)'^ZkA[  *  AjZj  -  2„.,a5  .  AjZl.^  . 


(2)  When  N  ♦  •  ,  the  constant  matrix  J  in  (11)  becomes  negligible  with 
respect  to  the  second  term.  Hence  we  get  the  well  known  asymptotic 
result  [1] 


lim  N*  J. 


(3)  The  difference  between  the  exact  information  and  its  asymptotic 
approximation  (30)  is  not  necessarily  either  positive  or  negative 
definite,  but  can  be  indefinite  in  general.  Therefore  the  exact  CRB 
of  the  AR  parameters  for  short  data  records  can  be  either  larger  or 
smaller  than  its  corresponding  asymptotic  approximation.  Some 
examples  are  given  in  the  next  section. 


3 .  SOME  EXAMPLES 


I 


f 


Let  us  first  consider  the  case  of  first  order  AR  process  with  parameters 


{a  ,  a}  .  In  this  case  we  get* 


N 

17 


7^ 


2  a" 


LcT‘(l-a^  (1-a^)^ 


N-1 


1-a" 


(31) 


It  is  of  interest  to  examine  the  ratio  of  the  exact  CRB's  on  and  a  to 
their  respective  asjmptotic  approximations.  We  denote  by  the  exact 

CRB  of  the  parameter  9j^  ,  and  by  bnj(9|j)  the  asymptotic  approximation  of  the 
bound.  Inverting  the  matrix  in  (31)  and  using  the  diagonal  entries  of  the 
inverse  we  get 

2 

,  N(N-l)(l-a^)  *  2Na^ 


s,(7) 


N(N-l){l-a^)  +  2(N-l)a‘^ 


(32) 


bM(a) 


N^(l-a^) 


^^(a) 


N(N-l)(l-a^)  +  2(N-l)a^ 


(33) 


•  '* 

n  t 

m  y 

Ed.  (32) 
than  1, 
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From  eq. 

rl,"' 

Thus  we 

f? 

(i) 

§ 

(ii) 

/  is  always  greater 


b«(a) 


6:^(a) 


<  1  if  and  only  if  |a(  >  • 


(34) 


Thus  we  distinguish  among  three  different  cases: 


1/^ 

|a|  >  (2)"  ;  in  this  case  the  exact  bound  on  a  is  always 

smaller  than  the  asymptotic  bound. 


|a|  <  (3)“^'^^  ;  in  this  case  the  exact  bound  is  always  greater 
than  the  asymptotic  bound. 


*This  formula  was  also  given  in  [4]. 


(iii)  (3)’^^^  <  la|  <  (2)“^^^  ;  in  this  case  the  exact  bound  is  greater 
than  the  asymptotic  bound  for  small  values  of  N,  and  then  changes 
direction  and  becomes  smaller  than  the  asymptotic  bound  for  large 
values  of  N. 

A  similar  behavior  is  observed  for  second  order  AR  processes.  The  explicit 
formulas  are  too  complicated  to  analyze  by  inspection,  so  one  has  to  resort  to 
numerical  evaluation  of  the  CRB’s.  We  tested  several  second  order  processes 
with  complex  poles  of  varying  magnitudes  and  a  constant  phase  angle  of  45*. 
Figures  1,  2  and  3  show  the  results  for  a^,  a^}  ,  for  the  three  test 

cases  specified  in  Table  1. 

Table  1:  Three  Test  Cases  of  Second  Order  Processes 


Case  no.  1  corresponds  to  a  narrowband  process,  'while  case  no.  3  -  to  a 

2 

broadband  process.  As  we  see,  the  exact  bound  on  a  Is  always  greater  than 
the  asymptotic  approximation.  The  behavior  of  the  bounds  on  the  .4R  parameters 
a^  and  a^  depends  on  the  nature  of  the  process.  It  appears  that  for 
narrowband  processes  the  exact  CRB's  approach  the  asymptotic  approximations 
from  below,  while  the  opposite  is  true  for  broadband  processes.  In  case  no. 

2,  which  represents  an  intermediate  bandwidth  process,  the  behavior  of  the 
bound  changes  direction  as  N  increases. 
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4.  CONCLUSIONS 

An  explicit  formula  vtas  derived  for  the  Cramer -Rao  bound  on  unbiased 
estimates  of  the  parameters  of  Saussian  AR  processes.  The  formula  contains  a 
term  linear  in  the  number  of  measurements,  plus  a  constant  term.  The 
additional  constant  term  is  indefinite  in  general,  so  the  exact  CRB  can  be 
either  larger  or  smaller  than  the  corresponding  asymptotic  approximation. 

A  conmon  problem  in  random  signal  processing  is  that  of  estimating 
narrowband  signals  from  short  data  records.  We  have  demonstrated  that  in  such 
situations,  the  actual  CRB  can  be  much  smaller  than  the  asymptotic 
approximation.  It  is  therefore  recommended  that  in  analyzing  AR  algorithms 
for  short  data  records,  comparison  should  be  made  to  the  bound  derived  here, 
rather  than  to  the  more  commonly  used  asymptotic  bound. 

We  finally  note  that  the  result  derived  in  this  note  apparently  does  not 
carry  over  to  moving-average  and  ARMA  processes.  These  two  models  are  not 
linear  regressions,  hence  the  information  matrix  is  not  likely  to  depend 
linearly  on  the  number  of  data  points.  Formula  (10)  can  still  be  used  to 
compute  0  for  any  desired  value  of  N.  However,  the  amount  of  computations  is 
proportional  to  ,  so  this  may  not  be  convenient  in  practice.  Asymptotic 
CRB  formulas  for  the  ARMA  and  AR-plus-Noise  cases  can  be  found  in  [5]. 
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Figure  2:  The  Ratio  of  the  Bounds  on  a 


Figure  3:  The  Ratio  of  the  Bounds  on  a 
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ON  INSTRUMENTAL  VARIABLE  ESTIMATION  OF  SINUSOID  FREQUENCIES 
AND  THE  PARSIMONY  PRINCIPLE 

Petre  Stoica,  Benjamin  Friedlander,  and  Torsten  Soderstrom 


ABSTRACT 

Mulitple  sinusoids  in  noise  can  be  modeled  as  an  ARMA  process  with  the  AR 
parameters  satisfying  certain  symmetry  relations.  According  to  the  "parsimony 
principle"  the  constraints  on  the  AR  parameters  should  be  taken  into  account 
to  get  improved  estimation  accuracy.  It  is  shown  in  this  note  that  when 
estimating  the  AR  parameters  by  a  general  instrumental-variable  method,  such  a 
parsimony  does  not  necessarily  apply.  However,  the  parsimony  principle  does 
hold  when  an  optimal  instrumental -variable  method  is  used. 
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1.  INTRODUCTION 

A  sinusoids  in  noise  process  obeys  an  ARMA  equation  of  a  special 
structure  Cl, 2, 11].  In  particular,  the  AR  parameters  of  this  ARMA  equation 
possess  a  certain  symmetry  property  [1,5,11].  Taking  this  symmetry  into 
account  should  presumably  result  in  improved  estimation  accuracy.  Since  the 
AR  parameters  contain  complete  information  on  the  sinusoid  frequencies,  their 
accurate  estimation  is  important. 

A  frequently  used  technique  for  estimating  the  AR  parameters  of  an  ARMA 
is  the  MYVl  method  [1,2,3]  which  is  closely  related  to  the  more  general  class 
of  instrumental  variable  (lY)  methods  [14].  Within  this  technique,  the 
symmetry  which  the  AR  parameters  must  satisfy  may  be  ignored  [2,3]  or  taken 
into  account  [1,5,11].  The  computational  burdens  that  result  in  either  case 
are  comparable  [5].  However,  we  may  expect  that  better  accuracy  should  be 
obtained  in  the  second  case.  This  would  presumably  follow  from  the  so-called 
“parsimony  principle"  [8,9]. 

Our  aim  here  is  to  investigate  this  conjecture.  Taking  into  account  the 
symmetry  of  the  AR  parameters  may  lead  often  to  improved  estimates.  However, 
we  show  by  means  of  a  counter-example  that  this  is  not  always  true,  contrary 
to  what  is  sometimes  stated  in  the  literature  [11].  We  also  show  that  if  an 
optimal  instrumental  variable  method  is  used  (closely  related  to  the  MYW 
method  with  an  optimal  weighting  matrix)  then  the  parsimony  principle  does 
hold. 

2.  MAIN  RESULTS 

Consider  the  following  sinusoidal  signal 


x(t)  =  i  a  sin((a  t  +  *  1 
k=l  K  K 


(2.1) 


■r. 

V. 


where 

^  ^  *  *^k  ^  ^  ^  ^  ^  * 

u.  *  la.  for  i  ^  j 
1  1 

Let  y(t)  denote  the  noise-corrupted  measurements  of  x(t). 

y(t)  =  x{t)  +  e(t),  t  *  1,2,...,  (2.2) 

where  e(t)  is  a  sequence  of  independent  and  identically  distributed  random 

2 

variables  with  zero  mean  and  variance  x  •  We  assume  that  x(t)  and  e(s)  are 
uncorrelated  for  any  t  and  s. 

As  is  well  known,  x(t)  obeys  the  following  autoregressive  equation 

Cl.  2.  11], 

x(t)  +  aj^x(t-l)  +  ...  +  a^x(t-n)  *  0,  n  *  2m  (2.3) 

where  {a^}  are  defined  by 

1  +  a.z  +  ...  +  a  z”  =  n  (1  -  2  cos  z  +  z^)  -  A(z)  .  (2.4) 

i  n  K 

Since  A(z)  has  complex-conjugate  unit-modulus  roots,  we  must  ha''t 


A(q  )  y(t)  =  A(q  )e(t)  , 


which  can  be  written  as 


y(t)  =  (^^(t)0  +  A(q“M  e(t)  , 


where 


(^(t)  =  -Cy{t-1)  ...  y{t-n)]  , 


0  “  * 


If  the  constraint  (2.5)  is  taken  into  account,  then  (2.7)  becomes 


y(t)  +  y(t-n)  =*  ij)^(t)  a  +  A(q’M  e(t), 


Inhere 


i>(t:)  =  -C(y(t-1)  +  y(t-n+l)},  ....  {y(t-m+l)  +  y(t-in-l)},  {y(t-m)}]^ 


o  =  Caj  ...  a^]  . 


Consider  the  following  instrumental  variable  (IV)  estimates  of  0 


(i)  Unconstrained  lY  estimate 


*  1  j 

9  =  arg  minii[^  V  z(t)<|,  (t)]o 
e  t»l 


[f  .1.  .  Q  >  0 


where  M  denotes  the  number  of  data  points  and  z(t)  is  the  IV  vector  given  by 


z(t)  =  [y(t-n-l)  ...  y(t-n-M)]  ,  M  >  n  . 


or  equivalently,  e  is  the  least  squares  solution  of 


F  I  Z(t),^(t)]a  »  ^  2(t)y(t)] 

t® 1  t^ 1 


(ii)  Constrained  IV  estimate 


Mote  from  (2.5)  that 


3  =  U  1  +  e. 


where 


ej^  =  [0,  ....  0,  1]^  =  unit  vector  of  length  Ic 


(2m  X  m) 


{2.11b) 


(2.12) 


(2.13a) 


(2.13b) 


(2.13c) 


The  constrained  estimate  3  is  defined  by  (2.11),  under  the 
constraint  (2.13).  Thus, 


argmini;!  ]  z{  t)v'^(  t)  ]  a  -  [i  ^  z(t){y(t)+y(t-n)}j2 

a  t=l  t=l 


(2.14) 


9  =  U  a  +  e. 


or  equivalently,  a  is  the  least-squares  solution  of 


I  Z(t)/(t)]a  =  I  Z(t){y(t)+y{t-n)}] 

^  t*l  ”  t=l 


(2.15) 


Note  that  by  the  transformation  in  (2.13),  we  have  converted  the  constrained 


optimiz.ition  problem  for  a  into  an  unconstrained  optimization  problem  for 


a  (2.14).  We  could  also  have  obtained  9  by  using  the  results  of  the  theory 


of  least-squares  regression  with  linear  constraints  [13].  However,  the 


formula  provided  by  this  theory  for  9  ,  though  equivalent  to  (2.14),  is  more 


complicated  [12,13]. 


The  IV  estimates  (2.11)  and  (2.14)  are  asymptotically  equivalent  to  some 


MYW  estimators  which  are  easier  to  implement.  There  are  various  interesting 


computational  issues  related  to  (2.11)  and  (2.14),  or  rather  to  the 


asymptotically  equivalent  MYW  estimators,  for  which  we  refer  to  [1-5,  10,  11]. 


We  are  interested  in  comparing  the  accuracies  of  the  two  estimates 


e  and  e  .  To  do  this  we  rely  on  the  following  asymptotic  results  which 


follow  from  the  general  theory  developed  in  [6][7][12].  The  asymptotic 


distributions  of  the  normalized  IV  estimation  errors  are  given  by: 


/N  ,  distribution  , 


(2.16) 


where 


,  <.  -■*  <■ 


P:  =  (R  QR)  *R'QSQR(R'OR)  \ 
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R  =  E{z(t)(t)  (t)}  , 


S  =  E{A(q"hz(t)  A(q‘^)z^{t)} 


E{A(q‘^)  : 


e(t-l) 


e(t-M) 


A{q'M[e(t-l)  ...  e(t-M)] 


✓N  ^  distribution  .  ^  , 

— H--r= - >*^(0.  P:) 


where 


P-  =  (rV)"^  rVorcrV)”^  , 


R  *  E{z(t)  ij,'  (t)}  =  RU  . 


The  last  result  implies  that 


,Z  distribution  , 

_  (9  -  9)  (0,  P:) 


where 


p:  =  u  p.  «  u(u^RVii)"^u^R^O  sqru(u^rVu)‘^u 

d  a 


The  covariance  matrices  P  and  9  depend  on  Q.  It  can  be  shown  [12],  [15] 
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that  P^  and  P“  are  bounded  from  below  by 
e  0 

P.  =  (2.25) 

6 

and 

P:  =  U(u'^R^S"^R  U)"^U^  (2.26) 

6 

Furthermore,  it  is  straightforward  to  show  that  the  lower  bounds  above  are 
attained  for 

Q  =  S“^  (2.27) 

The  IV  estimates  (2.11),  (2.14)  with  the  optimal  weighting  matrix  above  are 
called  optimal  lY  estimators.  For  a  discussion  of  their  implementation  see 
[12], [16]. 

The  above  covariance  matrices  are  useful  in  evaluating  the  performances 

of  the  two  estimates  (2.11)  and  (2.14)  in  specific  cases.  For  the  case  of 

"standard"  ARMA  processes,  an  extensive  analytical  study  of  performance  of  the 

MYW  estimators  has  been  reported  recently  in  [4].  For  the  sinusoids  in  noise 

process,  no  similar  study  of  performance  of  estimates  like  (2.11)  and  (2.14) 

appears  to  be  available  in  literature. 

The  question  we  want  to  address  here  is  whether  P.  >  P*  .  First  we  note 

9  9 

that  the  problem  under  study  is  related  to  the  theory  of  least-squares 

regression  with  linear  constraints.  For  Q  =  I,  P^  and  P*  can  be  interpreted 

9  9 

as  the  covariance  matrices  of  the  constrained  and  unconstrained  least-squares 


estimates  of  the  parameters  of  a  regression  model  with  R  being  the  "regressor 
matrix"  and  S  being  the  covariance  matrix  of  the  residuals.  It  is  then  known 


that  >  P*  if  S=I  [13].  It  was  conjectured  in  [11]  that  >  P»  also  for 

0  0  0  0 
the  case  under  consideration  where  S  *  I  (and  Q*I).  However,  no  formal 

analysis  of  the  case  S  I  seems  to  be  available  in  the  literature. 

In  the  following  we  show  by  means  of  a  counterexample  that  for  S  *  I  and 

Q  =  I  the  inequality  P^  >  P^  does  not  necessarily  hold.  Note  that 

0  0 

P^  >  P“  implies  that 
0  0 


=  P^  (2.28) 


The  next  example  shows  that  P  >  P  does  not  always  hold,  thus  contradicting 
the  inequality  >  P^  . 


Example:  A  Single  $inusoid-in-Noise  Process 

We  evaluated  the  covariances  P_^  and  P^  for  Q  =  I, 

a  a 

m  =  1;  =  /Z  j  =  0;  “  1  j  M  *  2,  and  E[0.12ir,  0.88it] 

To  evaluate  P.  or  P,  we  need  to  compute  the  covariances  of  x(t).  These  are 
0  a 

given  by  the  well-known  formula  [10,11] 

2 

m  oj 

E{x(t)  x(t+k)}  =  7  cos  ko).  .  (2.29) 

j=l  ^  ^ 

In  Figure  1,  we  plot  log  P^  and  log  P^  versus  ^  .  For  <  o.2Tr  ^nd 

(j  >  0.8ir  ,  the  IV  estimate®(2.14)  has®much  better  accuracy  than  (2.11).  The 
poor  accuracy  of  (2.11)  for  such  values  of  u>  was  expected  since  the  matr.x 
R  R  in  (2.17)  is  nearly  singular  for  u,  close  to  0  or  ^  . 

For  (jj  in  the  range  [0.2it,  0.8it]  ,  the  accuracies  of  the  two  estimates 


m 


>  [■„.  o)”; 

0 


m 


are  comoarable.  Moreover,  for  m  in  the  intervals  shown  in  Figure  1,  the 
estimate  (2.11)  is  more  accurate  than  (2.14),  which  concludes  the  counter 
example. 


Next  we  show  that  the  parsimony  principle  does  apply  to  the  case  of 
optimal  IV  estimates,  in  the  sense  that 

>  P:  (2.30) 

e  6 

This  follows  from  the  theory  of  least-squares  regression  with  linear 
constraints  for  the  case  of  uncorrelated  residuals  [13].  The  materices 

P-.  and  P*  can  be  interpreted  as  the  covariance  matrices  of  the  constrained 
6  9 

and  unconstrained  least-squares  estimates  of  the  parameters  of  a  regression 
-1  /2 

with  SR  as  the  regressor  matrix  and  with  uncorrelated  residuals.  For  the 
case  considered  here  we  have  a  simple  proof  of  (2.30)  which  we  include  for 
completeness. 


Proof  of  (2.30); 

K  >  P:  < - >  R^S'^R  -  (R^S‘^)U(uVs”^RU)"^u'''(R^S'^R)  >  0 
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<— > 


R^S"^R 

(R^S"^R)U 

I 

JJ^(R^S"^R) 

U^(R^S"^R)U_ 

3 

(r''’s"^R)’-^ 


>  0 


3.  CONCLUSIONS 

It  was  shown  that  the  parsimny  principle  does  not  hold  in  general  when  an 
lY  (or  MYW)  method  is  used  to  estimate  the  parameters  of  sinusoid-1n-noise 
type  models.  However,  when  an  optimal  IV  (or  an  optimal  MYW)  method  is  used, 
the  parsimony  principle  does  hold.  This  result  is  interesting  from  a 
theoretical  standpoint  and  helps  to  clarify  some  conjectures  made  in  the 


>v 


literature. 
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Figure  1:  Comparison  of  the  Accuracies  of  the  Constrained  and 
Unconstrained  IV  Estimates 
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ABSTRACT 


This  paper  considers  the  problem  of  estimating  the  parameters  of 
continuous- time  stationary  Gaussian  processes  with  rational  spectra,  from 
uniformly  sampled  measurements.  The  sampled  process  is  shown  to  be  an 
autoregressive  moving-average  process,  and  explicit  relationships  between  the 
parameters  of  the  continuous-time  and  the  sampled  processes  are  derived. 

These  relationships  are  then  used  to  derive  a  lower  bound  on  unbiased 
estimates  of  the  continuous- time  parameters,  and  on  the  generalized  variance 
of  such  estimates.  It  is  shown  by  some  examples  that  the  bound  on  the 
generalized  variance  depends  on  the  sampling  interval  in  a  non-monotonic 
manner.  In  particular,  for  each  specific  set  of  parameters,  there  exists  a 
sampling  interval  for  which  the  lower  bound  is  minimized. 


This  work  was  supported  by  the  Army  Research  Office  under  contract  number 
DAAG29-83-C-0027. 


1.  INTRODUCTION 


Digital  processing  of  continuous-time  signals  Involves  the  sampling  of 
these  signals.  Most  often  the  sampling  Is  uniform,  I.e.,  the  sampling 
Interval  Is  constant.  In  some  cases  the  user  Is  Interested  In  modeling  the 
sampled  signal,  rather  than  the  original  continuous- time  signal,  while  In 
other  cases  a  model  of  the  continuous- time  signal  Is  required.  A  typical 
situation  Is  that  of  a  signal  generated  by  a  physical  system  whose 
mathematical  model  Is  known,  but  whose  parameters  are  unknown  (such  as  a 
mechanical  system  with  unknown  masses,  viscosities  and  spring  coefficients). 

In  such  cases  the  primary  goal  of  the  digital  processing  Is  to  Identify  the 
parameters  of  the  original  continuous- time  system. 

In  this  paper  we  consider  a  special  class  of  continuous-time  signals; 
stationary  Gaussian  processes  with  rational  spectra.  It  Is  well  known  that 
uniform  sampling  of  such  processes  gives  rise  to  autoregressive  moving-average 
(ARM A)  processes  of  order  equal  to  the  denominator  degree  of  the  spectral 
density  of  the  continuous-time  process.  The  achievable  accuracy  In  estimating 
the  parameters  and  the  power  spectral  densities  of  ARM  A  processes  was  studied 
In  Cl]  ,[2].  In  this  paper  we  give  quantitative  answers  to  the  following 
questions:  1)  What  Is  the  achievable  accuracy  In  estimating  the  parameters  of 
the  continuous- time  spectral  density  from  the  sampled  ARMA  process?  11)  How 
Is  the  achievable  accuracy  affected  by  the  choice  of  sampling  rate? 

We  assume  that  the  number  of  data  points  of  the  sampled  signal  Is  fixed, 
I.e.,  that  the  total  Interval  over  which  data  are  collected  Is  proportional  to 
the  sampling  Interval,  This  Is  a  reasonable  assumption  since  It  Is  often 
desired  to  process  data  In  batches  of  a  fixed  size.  We  also  assume  that  the 
parameter  estimation  method  used  Is  unbiased,  at  least  asymptotically  (e.g., 
the  maximum  likelihood  estimator).  Under  these  assumptions,  we  show  that  the 
lower  bound  on  the  generalized  variance  of  the  continuous-time  parameter 
estimates  Is  a  non-monotonIc  function  of  the  sampling  Interval.  Consequently, 
for  any  given  set  of  parameters  there  exists  a  sampling  Interval  for  which  the 
generalized  variance  Is  minimal.  The  range  of  sampling  rates  for  which  the 
generalized  variance  Is  nearly  minimal  (the  flat  region  of  the  curve)  can  be 
small  or  large,  depending  on  the  characterltics  of  the  given  signal. 


The  outline  of  the  paper  is  as  follows.  In  section  2  we  derive  closed- 
form  expressions  for  the  parameters  of  the  sampled  process  as  a  function  of 
the  parameters  of  the  given  process.  In  section  3  we  derive  a  lower  bound  on 
the  variances  of  unbiased  estimates  of  the  continuous- time  parameters.  In 
section  4  we  illustrate  the  existence  of  an  optima!  sampling  rate,  and  examine 
the  dependence  of  the  generalized  variance  on  the  sampling  interval  for  some 
examples.  In  section  5  we  discuss  potential  applications  of  the  results  of 
this  paper.  The  reader  may  want  to  skip  sections  2  and  3  and  go  directly  to 
section  4  on  the  first  reading. 


2.  DISCRETIZATION  OF  THE  CONTINUOUS-TIME  SPECTRUM 


Let  x(t)  be  a  continuous- time  Gaussian  stationary  random  process.  The 
process  1s  assumed  to  have  zero  mean  and  a  rational  power  spectral  density 
function 

(1) 

where 


Sj^(s) 


6(s)  b(-s) 
oTsTof-sT” 


.(s)  .  s"  *  ...  .  ;  p(s)  =  ...  .  . 

Both  polynomials  are  assumed  to  have  all  their  roots  In  the  left  half  plane. 
Also,  to  simplify  the  analysis,  we  restrict  ourselves  to  the  case  where  all 
the  roots  of  a(s)  are  distinct.  Note  that  the  degree  of  ^(s)  Is  strictly 
less  than  that  of  otls)  .  This  means  that  x{t)  does  not  contain  a  white  noise 
component. 

Assume  that  x(t)  Is  sampled  at  multiples  of  the  sampling  Interval  T,  to 
yield  a  discrete-time  Gaussian  stationary  process  {yj^}  ,  where 

yj^  »  x(kT)  ,  k  *  ...  -1,0,1,.... 

Our  aim  Is  to  derive  an  expression  for  the  power  spectral  density  of  {y|^}  , 
which  we  will  denote  by  S^(z)  .  As  we  will  see  {y^^}  turns  out  to  be  an 
autoregressive  moving-average  (ARMA)  process  of  order  (n,  n-1). 

The  continuous- time  spectrum,  being  a  symmetric  function  of  s,  can  be 
decomposed  as 


■  '2' 

where  ^(s)  *  ...  +  •  The  coefficients  of  y{s)  can  be  obtained 

from  those  of  a(s)  by  equating  coefficients  In  the  Identity 

Y(s)a(-s)  +  Y{-s)a(s)  *  b(s)3(-s)  .  (3) 
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Let  us  Introduce  the  following  notation 


We  can  now  use  the  inverse  Laplace  transform  to  get  the  autocorrelation 
function  of  x(t). 


n  \-l-cl. 


(7) 


The  covariance  sequence  of  the  discrete- time  process  {yj^}  is  obtained  by 
sampling  multiples  of  the  interval  T, 


’■/l)  ‘  ■  Jj  • 


(8) 


where 


X  T 

^e*"  ,  l<m<n. 


The  power  spectral  density  of  {yj^}  is  defined  as  the  z- transform  of  the 
covariance  sequence. 


S  (2)  •  i  r 


upI 


•  i.yi 


m=l  ji»— 
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(9) 


As  we  see,  S^(z)  is  a  rational  function  of  z  having  a  reciprocal  symmetry. 

By  bringing  the  terms  of  the  right-hand  side  of  (9)  under  common  denominator, 
we  get 


where 


Sy(z) 


e(z) 

a(z)a{z”^) 


a{z)  »  1  +a^z  + 


»  e„_. 


..  a^z”  «  (l-jijzKl-iigZ)  ... 

+  e, z  +  e.  +  e. z“^  +  ...  +  e_  ,z“^"' 


(10) 
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! 
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Mote  that  e{z)  possesses  reciporcal  symmetry.  Also,  from  (9)  it  is  clear 
that  e(2)  is  positive  for  all  z  =  e'^“,  -n  <  u)  <  n  •  Hence  e(z)  can  be 
factored  as 


e{z)  *  cj^b{z)b(z‘M  , 


where 


b(z)  -  1  +  b^z  +  ...  +  b„.iz"-  =  (l.viZ)...(l-v„.iZ)  . 

and  all  1  <  m  <  n-1}  have  magnitudes  strictly  less  than  one.  Finally, 

the  discrete- time  power  spectral  density  is  given  by 

n-1  , 

2  ,  I  (l-v„,z){l-v^z  h 

s  (2)  .  ’  .  ,2  JEi -  .  (12) 

^  a(z)a(z  ...  -1. 

in*l 

As  we  see,  the  discrete- time  process  {yj^}  can  be  modeled  as  an  ARM  A  process 
of  order  (n,n-l), 

where  {Uj^}  is  the  innovation  process  of  {yj^}  and  is  the  variance  of 


3.  THE  BOUND  OH  THE  VARIANCE  OF  THE  ESTIMATES 


As  we  saw  In  the  previous  section,  the  spectral  density  of  the  sampled 

process  depends  on  the  parameters  {o^,  which  In  turn  depend  on  the 

parameters  (a  ,B  }  of  the  continuous-time  process.  Suppose  we  have  N 
m  m 

consecutive  measurements  of  the  sampled  process,  say  {yj^,  1  <  k  <  N>  .  Since 

{yj^}  Is  an  ARM  A  process,  the  parameters  .  or  equivalently 

{a2.  ,  can  be  estimated  by  any  of  several  available  techniques  (such 

as  maximum  likelihood,  nonlinear  least- squares,  pseudo-linear  regression). 

The  estimated  values  of  can  then  be  computed  by  reversing  the 

procedure  described  In  the  previous  section.  Our  aim  here  Is  to  examine  the 
best  possible  performance  of  such  a  procedure,  I.e.,  to  derive  a  lower  bound 
on  the  variances  of  the  estimates  {o^,  . 

Let  us  denote  by  the  parameter  vector 

r  2,T 

9  »  v^,  ....  v„.i,  a  J  . 

The  large-sample  Fisher  Information  matrix  of  corresponding  to  N 

measurements,  Is  given  by  [3,  p.  2421 
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*,-l 
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1  * 
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...  0 
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The  Cramer-Rao  lower  bound  on  unbiased  estimates  of  9^^^  Is  given  by  the 


a*v>v.vv.>j 


•r- 


(15) 


inverse  of  the  information  matrix. 


CRB{e^^h  * 


We  are  interested  in  deriving  an  expression  for  the  Cramer-Rao  bound  for  the 
parameter  vector 


9  ®  « •  •  •  ><z^  I  » •  •  • » • 


This  is  related  to  the  CRB  for  via  the  formula  [4,  p.  194] 


CRB{9'®^  *  [M_]CRB{9^^'}[M^f 
30  30 


rd0  ®  in  i-lrde'®' iH 

00  00 


where  (.)^  denotes  Hermitian  transpose.  Rather  than  evaluating  the  Jacobian 
39^6'  /39^^^  directly,  it  will  be  convenient  to  introduce  four  intermediate 
vectors,  as  follows: 

^(2)  .  r  ^  ,T 

9  -Lu^,  •••  >  6^J 


r 

9  =  CXi 


.  Xp.  61 


.  6p] 


o'^'  -  r  1 

9  ~LX^»  ,  Xp  >  Y]_»  •••  »  YpJ 


e'5'  -[.l. 


*  ®n’  1^1’ 


•  yJ 


Then  we  have 


.  39^®^  39^^^  30^^^  30^^'  30^^^ 


m 
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Next  we  derive  explicit  expressions  for  the  various  partial  derivatives  in 
(17).  Recall  that 
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Multiplying  both  sides  by  and 


2 


-1 


hence 


log&  »  logo^  +  i  1og(l-v„u^)  +  I  log(l- 
in-1  tn-1 


substi tuting 
) 


Differentiating  (20)  gives 


Equations  (21a)  -  (21d)  provide  the  quantities  needed  to  evaluate 
.  Next  recall  that 

y\J 

-  e 


ilk  ■ 

j  0  ; 


A  ■  k 


A  *  k 


We  can  now  evaluate  ae^^Vae^^^  •  Next  note  that 


.n-1  . 

*  ••• 


and  therefore 


109  \  ■  logiTjxjJ'^  ♦  •••  *  r„)  -  I  ’<>9(vV 
Differentiating  (25)  y+elds 


l_£fk  ^  j _ 

®k 


A  k 


1  ^  *  •••  ^n-1  _  H  1 

^  ^  ...  +  Yn  K’V 

1  ®*k  _  ^  ^ 

37  af  — f1=I - 
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(26a) 


(26b) 


(26c) 


Equations  (26a)  ■  (26c)  make  It  possible  to  evaluate  ae^^Vae^^^  and  thus 


ae^^Vae^^^  • 


Next  we  note  that 


n  n  n  1 

n  (s-X-)  =  S  +  aS  "+...+  a>  . 
m-l  ^  " 


Differentiating  the  equation  above  gives 


n 


mvil 


.n-A 
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and  substituting  s  *  yields 


" ‘W 


This  makes  It  posslhle  to  evaluate  59**'/a6*®'  and  hence  ae’^’/ae' 
Finally,  recall  that 


X 


^  A  ^8  8  • 


Differentiation  yields 


•1  dA 


S- .  1  A-l  II- 9  »  1  ,-1.  5L  .  a-l  iB_ 


58,-7 


58,i*? 
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where 


,aA  ,  J  =  '  ‘  2'--'  ‘  " 


;  otherwl se 


;  1  <  21-a  <  n 
;  otherwise 


From  equations  {31a)-(31d)  we  can  evaluate  ,  and  thus 


ae'«'/aa‘=' 


iTI»e:»>TIV 


4.  THE  EXISTENCE  OF  AN  OPTIMAL  SAMPLING  RATE 


In  this  section  we  examine  the  behavior  of  the  bound  derived  earlier,  as 
a  function  of  the  sampling  Interval  T.  First,  we  note  the  following:  A 
stationary  process  with  a  rational  spectral  density  function  has  an  Infinite 
bandwidth,  and  an  Ideal  reconstruction  from  the  samples  Is  Impossible. 
However,  there  exists  a  sampling  rate  allowing  a  unique  reconstruction  of  the 
process  parameters  from  the  ARMA  parameters  of  the  sampled  process.  This 
critical  rate  Is  determined  by  the  requirement  that  all  the  discretized  roots 


{e  ,  1  <  m  <  n}  have  phase  angles  In  the'  range  [-n.it]  ,  I.e. 


T  <  min  f- 

4  _  __  I  I 


-}  . 


(32) 


l<in<n  |lM(x„)) 

where  IM(«)  denotes  the  Imaginary  part  of  the  complex  argument.  If  all  the 
are  real,  the  sampling  Interval  T  can  be  made  arbitrarily  large. 


Let  us  now  examine  the  case  of  a  first-order  rational  spectrum,  I.e. 

,2 


SJs) 
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The  corresponding  discrete- time  spectrum  Is 
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where 
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In  this  case  we  get 
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A  convenient  scalar  measure  of  the  magnitude  of  the  estimation  error  Is  the 
so-called  generalized  variance,  which  Is  the  determinant  of  the  error 
covariance  matrix.  This  Is  bounded  from  below  by  the  determinant  of  the  CRB 
[4,  p.  195].  We  define 


d(0*®h  ^  |CRB(9^®^| 


(38) 


For  the  first-order  case  we  get,  using  (36), (37) 


(39) 


Consider  now  the  case  In  which  the  sampling  Interval  T  varies,  while  the  total 
number  of  samples  N  remains  fixed.  The  continuous-time  parameters 
a  and  3  are  also  assumed  to  be  fixed.  For  both  T  0  and  T  •*•  •  ,  the  bound 
on  the  generalized  variance  goes  to  Inflnl^,  as  can  be  verified  by  using 
L'HospItal's  rule.  Hence  there  exists  a  global  minimum,  which  was  evaluated 
numerically  to  occur  at  T  «  0.8o”^  .  The  behavior  of  d(9^®h  as  a  function 
of  T  Is  shown  In  Figure  1,  where  *-1  and  where  d(9^®^)  was  nomallzed  by  Its 
minimum  value. 


The  sampling  Interval  T  -  0.8a~^  Is  optimal  In  the  sense  of  minimizing 
the  best  achievable  generalized  variance  of  the  estimated  continuous- time 
parameters.  Me  conclude  that  for  first-order  rational  spectra,  there  exists 
an  optimal  sampling  rate  for  reconstructing  the  parameters  of  the  continuous¬ 
time  process. 


Higher-order  cases  appear  to  be  too  complicated  to  obtain  closed- form 
expressions.  However,  the  formulas  derived  In  Sections  2  and  3  can  still  be 
used  to  evaluate  the  bound  d(9^®h  for  any  given  values  of  {oj^,  3^}  .  Let 
us  consider  two  further  examples.  The  first  one  Is  that  of  the  second  order 


spectrum 


S  (s)  - - (^^s)(^s_)  ^  (40) 

(l+s+s^)(l-s+sM 

The  bound,  normalized  with  respect  to  its  minimum  value,  is  shown  in  Figure 
2.  Again  we  observe  the  existence  of  an  optimal  sampling  interval,  which  is 
about  0.9.  The  curve  is  relatively  flat  over  the  range  0.5  <  T  <  1.5  ,  and 
very  steep  outside  this  range. 


For  the  next  example  we  chose  the  fourth-order  spectrum 


Sx(s) 


(l+S-t-S^->-S^)(l-S-«-S^-S^) _ 

(2+4s+5s  +3s  +s  )(2-4s+5s  -3s  +s  ) 


The  normalized  bound  is  shown  in  Figure  3.  Here  the  optimal  sampling  rate  is 
approximately  0.5,  with  a  flat  range  of  about  0.25  <  T  <  0.75  . 


Finally,  we  illustrate  the  effect  of  the  damping  coefficient  of  the 
process  on  the  behavior  of  generalized  variance.  We  take  the  second-order 
spectrum 

S  (s)  » - - -  (42) 

*  (l+2i:s  +  s^){l-2cs+s^) 

where  c  is  the  damping  coefficient.  Figures  4  and  5  show  the  normalized 
bound  for  c  ■  0.9  and  c  ■  0.1  respectively.  As  we  see,  the  flat  region  for 
C  ■  0.1  is  about  twice  as  wide  as  the  flat  region  for  c  “  0.9  .  In  other 
words,  highly  damped  processes  appear  to  be  less  sensitive  to  the  choice  of 
sampling  rate  than  slightly  damped  process. 


5.  DISCUSSION 


We  have  derived  closed  form  expressions  for  the  Crmaer-Rao  lower  bound  on 
the  covariance  and  the  generalized  variance  of  the  estimated  parameters  of  a 
cont1nuous>t1me  rational  spectrum  from  measurements  of  a  uniformly  sampled 
realization  of  the  given  process.  We  explored  the  dependence  of  the  CRB  on 
the  sampling  Interval  and  demonstrated  the  existence  of  an  optimal  sampling 
Interval,  In  the  sense  of  minimizing  the  CRB  of  the  generalized  variance  for  a 
fixed  number  of  measurements. 

Since  the  optimal  sampling  Interval  depends  on  the  process  parameters.  It 
Is  reasonable  to  ask  whether  the  above  mentioned  phenomenon  can  be  used  In 
practice.  ^  common  practical  situation  Is  one  In  which  a  continuous- time 
process  with  slowly  time-varying  spectrum  Is  sampled,  and  where  batches  of 
data  are  processed  In  succession,  so  as  to  track  the  time  variation  of  the 
spectrum.  An  adaptive  sampling-rate  adjustment  procedure  can  be  Incorporated 
In  such  situations,  as  follows.  Each  time  a  batch  Is  processed  and  the 
process  parameters  are  Identified,  the  method  described  In  this  paper  can  be 
used  to  compute  the  optimal  sampling  rate,  which  Is  then  used  as  the  sampling 
rate  for  the  next  batch.  If  the  time  variation  of  the  process  parameters  Is 
sufficiently  slow,  this  will  result  In  a  nearly  optimal  sampling  rate. 

The  same  Idea  can  be  applied  to  an  off-line  processing  of  analog-recorded 
continuous- time  signals.  Here  the  sampling  rate  adjustment  would  be  Iterative 
rather  than  recursive,  where  at  each  Iteration  the  same  analog  data  Is  re¬ 
sampled. 
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Figure  3.  The  Normalized  Bound  as  a  Function  of  the  Sampling  Interval 
A  Fourth  Order  Process 


